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Preface 



I set up the Talking Heads experiment with a group of brilliant students and collaborators 
at the end of the nineteen nineties. It was intended to be the first large-scale, open-ended 
experiment in the emergence of a shared set of grounded concepts and a vocabulary for 
expressing these concepts by a population of autonomous agents. Inspired by Ludwig 
Wittgenstein, the experiment took the form of a series of language games, more con- 
cretely games of reference about a “world” made up of geometric figures pasted on a 
white board and observable by the agents through pan-tilt cameras. I wanted to demon- 
strate with this experiment earlier breakthroughs in the study of language origins and 
test whether they would hold for large-scale populations and open-ended environments. 
I also wanted to find out how humans would interact with these agents. So we made it 
such that human users, after logging in through the Internet, could teach new words to 
agents or use the words they learned from the agents to play their own language games. 

In 1999, the experiment went live in the context of an exhibition called Laboratorium 
organised in Antwerp (Belgium) by Hans-Ulrich Obrist and Barbara Vanderlinden. After 
a first experimental run from 27 June 1999 to 3 October 1999, the experiment was repeated 
as part of a new exhibition called NOISE organised in Cambridge and London (UK) by 
Adam Lowe and Simon Schaffer from 22 January to 26 March 2000, with additional 
installations at the Palais de la Decouverte in Paris and several other places. 

At the occasion of the 1999 Laboratorium exhibition, the draft of a book was pub- 
lished that described the experiment and the underlying theoretical assumptions in con- 
siderable detail. For many reasons, not at least that work continued at great speed on 
other exciting experiments, the original “pre-edition” of the book never made it to a 
fully finished officially published work, and circulated only as an “underground” edition. 
This was disappointing because the Talking Heads experiment was an important break- 
through. Moreover the experiment contained the first inklings of mechanisms that since 
then have been worked out, enhanced and tested in many experiments which replicated 
the original results and further enhanced them. The present volume is intended to fill 
this gap. 

Part I of this book contains the original Talking Heads volume which has been only 
slightly edited to correct for minor mistakes. It is a miracle that the original source files 
survived and that the figures could be reconstructed. Part II of this book contains addi- 
tional unpublished background material, reports on different aspects of the experiment, 
including its scientific results, and a brief overview of further developments in language 
evolution research that took place on the basis of the experiment. 

Dozens of people worked on the Talking Heads. The initial research started in 1997 at 
the Artificial Intelligence Laboratory of the Free University of Brussels (VUB) funded by 




a “Geconcentreerde onderzoeks actie” (GOA) of the Belgian government. Joris Van Loov- 
eren worked closely with me on a first prototype that was demonstrated in 1997, using 
an active vision system custom-built by Tony Belpaeme and segmentation algorithms 
implemented by Danny van Tieghem. Edwin de Jong did the first theoretical investiga- 
tions of the underlying semiotic dynamics. Once the initiative for a public installation 
was taken, the main hub became the Sony Computer Science Laboratory in Paris, where 
I worked together intensely with Frederic Kaplan and Angus McIntyre, with additional 
contributions for the teleportation infrastructure by Silvere Taj an and Alexis Agahi. The 
AI Laboratory of the VUB remained a second hub where important contributions were 
made most notably by Joris Van Looveren, Tony Belpaeme, Holger Kenn and Mario Cam- 
panella. To all of them I am grateful that we were able to create such an extremely excit- 
ing experiment that stimulated many thousands of people to think about language and 
its origins in new ways. 

I also thank the curators of the Laboratorium Exhibition in Antwerp (Hans-Ulrich 
Obrist and Barbara Vanderlinden) and the members of its curating board (Bruno Latour 
and Carsten Holler), the curators of the NOISE exhibition in London (Adam Lowe and Si- 
mon Schaffer), artist Olafur Eliasson, with whom I collaborated on the Look into the Box 
piece at the Muse d’art Moderne in Paris, artist Anne-Mie Van Kerckhoven for collabora- 
tions for the Chromosophy laboratory in Aachen and the many organisers and helping 
hands who made the other installations possible. Sylvia Spruck Wrigley has helped to 
improve the 1999 pre-edition (now Part I of the book) under considerable time pressure 
and Marleen Wynants was crucial in the last phases of this publication with comments, 
professional advice, photographs, and support. 

Part II of the book discusses what happened after the Talking Heads experiment. Our 
research went through various boom and bust cycles. In the good years there was money 
to hire new people and push the research forwards, but then bad years would come and 
the team disintegrated again due to lack of resources. This made progress less substantial 
than it could have been but had the advantages that waves of new young people were 
given a chance to contribute. The first wave started working after the Talking Heads 
experiments were finished. At Sony CSL there was first exciting work by Pierre-Yves 
Oudeyer and Frederic Kaplan pursuing the origins of turn taking and symbol usage with 
the aibo robots, thus exploring even earlier stages in the origins of language. 

Thanks to the FP6 ECAgents project and the FP7 ALEAR projects of the European 
Commission, a new team could be formed around 2004, which included Joris Bleys, 
Joachim De Beule, Bart de Vylder, and Jelle Zuidema at the VUB in Brussels. Mean- 
while, we got access through the Sony Computer Science Laboratory to the QRIO hu- 
manoid robots thanks to Masahiro Fujita and Hideki Shimomura. A new team formed 
in Paris which included Nancy Chang, Katya Gerasimova, Martin Loetzsch, Vanessa Mi- 
celli, Michael Spranger, Simon Pauw and Remi Van Trijp. I thank them all for major 
contributions to the experiments reported in the second part of the book. I also thank 
Stefano Nolfi for coordinating the ECAgents project with a firm hand and Manfred Hild 
and his team for creating the new humanoid myon robot that we used in later experi- 
ments. 



vi 




In 2008/2009 I was a fellow at the Wissenschaftskolleg in Berlin, which was an ideal 
environment to reflect on our research program and how we could proceed. Many new 
ideas came out of that year, in particular, a realisation that for several of the key puzzles 
we were trying to solve, inspiration could be found in evolutionary biology, and this led 
to a new path which only now is beginning to be explored. It is good to know that after a 
major collapse in funding around 2009, there is again a team of young people assembling 
to push research on language games further. They include Emilia Garcia Casademont in 
Barcelona, Miquel Cornudella and Paul Van Eecke in Paris, and Yana Knight in Brussels. 
Financing remains precarious but the future is in their hands! 

The present new edition is not only special because of the historical significance of 
the Talking Heads experiment but also because it is the first one in a new series “Compu- 
tational Models of Language Evolution” published by the Language Science Press. The 
intention of this series is to make available through Open Access in-depth models of lan- 
guage evolution that have been validated using agent-based computational simulations. 
I am grateful to Martin Haspelmath and Stefan Muller (the editors of the Language Sci- 
ence Press) for making it possible to publish in Open Access research results which do 
not quite fit in the standard mode. I thank Maria Ferrer Bonnet for help in the final stage 
of adapting bibliographies. And I am grateful to ICREA for time to create a revised and 
extended version of the Talking Heads Book and to the Institut de Biologia Evolutiva 
in Barcelona for providing such an excellent working environment. I thank in addition 
Remi van Trijp for his help in making this series a reality and Annemie Maes for her 
encouragement in the final stretches to finish this book. 



Barcelona, October 2014. 
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1 Introduction 



Inquiring about the origins of a phenomenon, as opposed to merely describing its present 
state, often leads to profound new discoveries. Biology provides a wealth of examples. 
Darwin asked the question of the origins of species diversity and thus discovered evolu- 
tion by natural selection. Pasteur wondered about the origins of life and thus discovered 
the role of bacteria in human diseases. My approach in this book is going to be similar. In 
order to advance our understanding of human language and cognition, I propose to ad- 
dress the fundamental question how language and meaning might ever have originated. 
I will not do this by a historical reconstruction, nor by an empirical investigation of child 
language acquisition or by examining data from the birth of new languages . 1 Rather, I 
will pose the question in a completely general way: how can a physically embodied au- 
tonomous agent arrive at a repertoire of categories for conceptualising his world and 
how can a group of such agents ever develop a shared communication system with the 
same complexity as human natural languages? 



1.1 The Talking Heads experiment 

For centuries, philosophers, linguists, psychologists and neuroscientists have been grop- 
ing with the amazing capacities of the mind. By necessity, they have been doing this 
through thought experiments or by observing human behaviour and brain anatomy. Al- 
though this has generated a wealth of insights , 2 everyone involved in this research must 
surely agree that we are still lacking adequate models, particularly for higher order cog- 
nitive functions like language processing, and definitely for understanding how such 
functions may have arisen. To discover or test such models, it therefore remains useful 
to do experiments with artificial systems. We can build robots that receive sensory in- 
puts through a camera or other sensors, give them computational power and memory, 
and empower them for action in the world by adding actuators. Given such a set-up, we 



1 There has been a renewed interest the last five years in the question of the origins of language from these 
various perspectives. Hurford, Studdert-Kennedy & Knight (1998) contains a representative sample of the 
most recent work. Other samples of recent research can be found in Hawkins (1992) and Velichkovsky & 
Rumbaugh (1996). 

2 Attempts are made to bring together the insights from various disciplines to establish a true “cognitive 
science”. Osherton (1995) contains an introduction into some of the main research trends in this very 
diverse scientific field. See also Luger (1994). The work reported in the present book can be classified 
as theoretical cognitive science because I try to formulate and test the operational adequacy of possible 
models for the origins of language and meaning but do not claim nor give any evidence that these models 
are also valid for human cognition, just as the study of aerodynamics and aircraft design may help to 
understand how birds can fly but is more generic than its biological implementations. 
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can precisely examine the operational adequacy of a hypothesis. For example, if some- 
one proposes a process for segmenting images, we can test this process by capturing 
streams of images through a camera and see whether the process is indeed capable of 
performing segmentation. When someone proposes a process for parsing sentences, we 
can implement this process and confront it with a series of example sentences to mea- 
sure success and failure. How else can we test the operational adequacy of a proposed 
cognitive model, given the enormous complexity involved? Of course, building an artifi- 
cial system in no way proves that the principles that were used to construct it are valid 
for natural systems. But it is an enormously valuable source of insight for approaching 
the extraordinarily complex phenomena observed in human cognition. 




Figure 1.1: Two Talking Heads are shown al and a2 each seeing the scene on the white 
board from a slightly different viewpoint. 

The Talking Heads experiment follows this research strategy. It features an enor- 
mously challenging experimental infrastructure to explore how a cognitive system, like 
the one underlying human language, might be able to bootstrap itself into interaction 
with other cognitive systems and driven by increasing challenges from the environment. 
The experiment involves a set of robotic “Talking Heads” engaged in language games, 
with each other or with human interlocutors, about real world scenes they perceive 
through their sensors (see Figures 1.1 and 1.2). The robots use vision as major sensory 
source. They are located in different places in the world and connected through the Inter- 
net. Two robotic agents can only engage in an interaction when they are instantiated in 
robot bodies in a shared physical environment. After an exchange, an agent can teleport 
himself to another body in another location and engage in interactions there. 

The agents’ categorisations of the world and their language is not programmed but 
emerges. It is constructed and learned by the agents themselves. The more interactions 
they have with other humans the more they adopt our concepts and language. Interact- 
ing with the Talking Heads is a bit like interacting with two year old twins; they play 
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Figure 1.2: A single “Talking Head”. There is a camera oriented towards a white board. 

The bottom screen shows what the camera observes. The top right screen 
shows the result of processing. A loudspeaker reproduces the utterances of 
the agent. 



most of the time with each other and develop their own language in the process, but the 
more humans engage in interaction with them, the more the language resembles existing 
human languages. 3 

Although the agents invent their own language and conceptualisations of the world, 
we had to program a basic cognitive architecture into the agents. This architecture is 
based on a set of relatively simple, biologically plausible mechanisms, which neverthe- 
less gives rise to enormous complexity. The goal of the experiment is to examine the 
explanatory power of these mechanisms: What phenomena do they cause and hence 
explain? 

1.2 The main hypotheses 

The Talking Heads experiment is first and foremost a scientific experiment. It subjects 
four radical ideas to experimental scrutiny. The first idea is that language emerges 
through self-organisation out of local interactions of language users. It spontaneously 
becomes more complex to increase reliability and optimise transmission across genera- 
tions of users, without a central designer. I call this the selfish language hypothesis: 



3 Often twins develop a private language, particularly if they do not interact much with adults. Compared to 
other children, twins are usually 6 months behind in their language development but that delay and also 
the private languages that go with it disappear by the age of 8. 
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Language colonises brains and recruits available cognitive capacities to satisfy its ap- 
petite for expressing ever more complex meaning with minimal effort and maximum 
effectiveness . 4 Language is not a uniform abstract system of rules (and definitely not an 
innate system of rules) but a creative open-ended complex adaptive system, like a natural 
ecology, in which certain solutions to relate forms with meanings become temporarily 
conventionalised in the community, even though new creative solutions emerge almost 
any time someone speaks. Language is in constant flux because new meanings contin- 
uously arise and existing forms undergo change beyond the control of any individual 
language user. 

The second radical idea is that meaning is built up slowly by each individual in a 
cumulative growth process. Meaning is not innate, as rationalists in the footprints of 
Plato have been arguing for centuries, nor learned through stepwise induction from 
examples and counterexamples, as empiricists have been saying. Meaning is at first 
very concrete and strongly situated in the environment and bodily experiences. I will 
take the suggestions made by Wittgenstein one step further, namely that meaning (and 
language) is constructed and practised as part of language games . 5 I will introduce a 
selectionist approach to the acquisition of meaning, introducing models to show that 
conceptual distinctions can “grow” in the brain like the leaves and branches on a tree and 
be pruned to fit the demands and characteristics of the environment in which an agent 
finds itself. Even though non-verbal activities, like predicting the future based on a model 
or deciding what to do in specific circumstances, stimulates the growth of distinctions, 
language use is probably one of the greatest stimulators of conceptual growth. It provides 
feedback about which distinctions were successful in linguistic communication and thus 
whether or not they should be preserved. Thus language and cognition co-evolve. Each 
one pushes the other up towards more complexity and they become tightly co-ordinated 
with neither a central co-ordinator nor prior innate design. 

A third idea concerns the characteristics of cognitive architectures. For centuries, the 
human cognitive system has been likened to a machine, most recently to the computer 
as an information processing machine . 6 Although there is a lot to say for adopting such 
a viewpoint, I will instead emphasise biological metaphors. Specifically, I will defend 
the idea that a living ecology is a better metaphor for a realistic cognitive system. In an 
ecology, there is constant change as the individual organisms adapt themselves to the 
physical environment and to other organisms sharing the same environment. There is 
evolution by selection so that successful adaptations survive and others disappear. There 
are failures but also repair processes happening at all levels of the ecological hierarchy. 
These various characteristics inspired the artificial architectures used in the experiments. 

A fourth radical idea concerns the nature and origins of grammar. Rather than invok- 
ing the need of a highly specialised genetically determined language organ , 7 I believe 

4 See Deacon (1997) for a discussion of co-evolution between cognitive and linguistic capacities and brain 
structures. 

5 Wittgenstein (1953) emphasises the relativity of concepts and the role of language and hence meaning in 
social interactions. 

6 See Newell & Simon (1976). 

7 As strongly argued by Chomsky in various writings, for example Chomsky (1986). Even though Chomsky 
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that grammar spontaneously arises when generic capabilities to categorise reality, store 
past events in terms of abstract schemas, remember associations between events, etc., 
reach a critical level and are applied to language itself. These capabilities are relevant 
across many different cognitive domains. In order to store linguistic experiences, human 
memories spontaneously structure them, thus introducing abstract schemas, internal 
categories, and roles that substructures can play in schemas. These organisational ele- 
ments then become externalised. Categories are marked by enriching the form of words, 
schema boundaries are marked by imposing patterns on the expression of a schema, 
roles are marked by assigning them to specific positions in a pattern. This externalisation 
increases the reliability in communication because it reduces ambiguities and supplies 
additional context. It also aids the stable transmission of the language from one genera- 
tion to the next because the language learner gets additional hints to guess the meaning 
and function of unknown words and constructions. Once such structuring devices are 
in place, they help to increase the expressive power of the language. The complexity 
of possible language interactions can increase, and words or word groups which had 
multiple usages can become specialised for particular semantic functions. 

The various language construction processes gradually shaping a full-blown language 
are not under the conscious control of individuals but instead constitute a collective 
enterprise. Language users structure and restructure their language and thus increase 
its systematicity, but there are also forces causing a breakdown of systematicity, such 
as erosion of a form through sloppiness of pronunciation, in turn causing a grammati- 
cal regularity to break down. The interplay between these constructive and destructive 
forces helps to explain the constant evolution of language and the growing diversity 
among languages emanating from the same source, such as French, Italian, and Spanish 
from Latin. 



1.3 A bottom-up approach to artificial intelligence 

My daughter Lenie grew up surrounded by computers and robots which her obsessed 
father was trying to infuse with artificial intelligence. When she was twelve, I asked her 
whether she thought any of the machines or programs she had seen were intelligent. She 
said no, someone had programmed them, so they were not intelligent themselves. The 
programmer was intelligent, not the machine. Indeed, this is true. 

In 1996, a computer program called Deep Blue defeated the reigning world champion 
Kasparov in a game of chess (Newborn 1996). Kasparov was astonished and depressed, 
and claimed this was a defeat of humans in the race against machines. But was he ac- 
tually beaten by artificial intelligence? Not really. A team of engineers and scientists 
from Carnegie Mellon University and from the IBM Watson Research Center had been 
working for ten years to program vast amounts of chess knowledge, invented by human 
experts, into Deep Blue. They had built extremely sophisticated dedicated computing 

argues for an innate language acquisition device he has expressed scepticism about evolution by natural 
selection as an explanation for the origin of this device (and therefore of language). A genetic theory of 
language evolution has been suggested by Pinker (1994). 



9 




1 Introduction 



hardware to apply this knowledge at a blinding speed. So, in beating Kasparov, other 
humans were the clever ones, not machines. 

Recently, the whole world looked on in fascination for several weeks as a small robot, 
the Rover Sojourner, ventured out on Mars, navigating through the rocky landscape, 
collecting samples, taking pictures and performing experiments . 8 Was this a first sign of 
artificial life? Even though the behaviour of the robot has some apparent characteristics 
of living systems, people more familiar with the project would say no. The robot was 
hardly autonomous; it continuously had to rely on signals coming from human engineers 
in order to set its next targets, or to deal with unforeseen circumstances. The robot’s 
behaviours were all human designed and carefully programmed. The robot itself was 
in no way adaptive. It did not learn new behaviours nor new interaction modes, as a 
living system would do. It was critically dependent on human engineers whenever its 
functionality needed to be extended or modified. 

This in no way diminishes the achievement in building these artificial devices, on the 
contrary, it does show we have to be careful in ascribing mental or biological qualities 
to machines. Despite the hype generated by the media and the occasional researcher 
taking his dreams for reality, intelligence and life remain very much the property of nat- 
ural rather than artificial systems. In a way, our powerful engineering methodologies 
make it too easy to succumb to a strategy of programming directly the human or animal 
behaviours we observe and interpret as being intelligent. But doing this, we keep simulat- 
ing the end products of intelligence rather than getting at the heart of intelligence itself. 
We put our own human concepts explicitly in the machine instead of implementing the 
mechanisms that enable an artificial agent to acquire new categories itself, implement- 
ing by hand a fixed set of predetermined behaviours which we believe the agent should 
have, rather than supplying mechanisms that allow the agent to acquire new behaviours 
when faced with unforeseen circumstances, and so on. Things are done this way because 
we simply do not know how to do them otherwise. 

The goal of the fundamental research reported in this book is not only to raise some 
profound fascinating questions about language, but also to lay the groundwork for an 
alternative bottom-up approach towards artificial intelligence. In this approach, the hu- 
man designer does not put his or her language and concepts into the computer, but tries 
to set up systems that autonomously generate their own. Indeed, if we have scientific 
models which explain how language originates, both in a language community and in 
new individuals born into a community, we should be able to operationalise these mod- 
els and show that they work on autonomous robotic agents. This is exactly what the 
Talking Heads experiment tries to accomplish. 



1.4 History of the project 

The Talking Heads experiment is the culmination of one of the most exciting scientific 
and engineering projects I have ever been involved in. It has required the creative efforts 



8 See Wunsch (1998) (a book for children!). 
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of a dozen excellent researchers over many years. The story started in 1985. Instead of 
continuing to design and program intelligence explicitly based on formalising human 
cognitive capacities, as most of my colleagues in artificial intelligence research labs were 
doing, 9 1 started to focus on the question of how intelligence might originate and evolve 
in physical agents as they interact autonomously with their environment or with other 
humans, and I encouraged my students at the Artificial Intelligence Laboratory of the 
University of Brussels (VUB) to experiment in the same direction. We initially developed 
a bottom-up, behaviour-oriented approach to sensori-motor intelligence, which was also 
being explored around the same time by Rodney Brooks at the MIT Artificial Intelligence 
Laboratory. The behaviour-oriented, bottom-up approach was a counter reaction to the 
symbolic, top-down approach of earlier AI research. See Steels & Brooks (1995) and Arkin 
(1998). We built robots of various sizes and shapes, using simple electronic circuits, Lego 
bricks, small motors, rechargeable batteries, self-made sensors, single board computers, 
and everything else that appeared useful (Figure 1.3). 




Figure 1.3: Example of a Lego vehicle built by Tim Smithers based on Lego bricks, 
a sensori-motor processing board, and a variety of sensors and actuators. 
We developed these robots in the early nineties for exploring a behaviour- 
oriented approach to robotics. 

Most robots drove around on wheels, but we also used balloons and propellers to build 
flying robots and experimented with a fish-shaped robot which swam in the university 
swimming pool by wagging its tail (see Figure 1.4). 

To investigate the role of the environment in shaping the evolving sensori-motor 
capacities of these robots, we built various robotic ecosystems in which robots could 
recharge themselves but also had to work for their living by dimming lights that took 
away energy from the total energy flowing in their ecosystem (see Figure 1.5). Visitors 

9 Some recent overviews of this “classical” approach to artificial intelligence can be found in: Nilsson (1998) 
and Russell & Norvig (2003). 
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Figure 1.4: “Artificial fish” built by Miles Pebody to explore influence of robot bodies on 
behaviour. The fish could swim around by wagging its tail and avoid obstacles 
based on infrared sensing. 



to our lab could see robots helping each other or engaging in fierce competition for the 
resources available for survival. In all of this research, we tried to see how far behaviours 
would autonomously evolve, in other words we tried to find mechanisms by which the 
robots would bootstrap themselves towards greater sensori-motor complexity. One of 
the main lessons from these experiments was that explanations for cognition lie partly 
outside the brain of the individual agent: The environment, the body, the sensori-motor 
apparatus and the behaviour of the other agents all partly shaped their capacities and 
further development. 10 

Many fascinating tales can be told about this research but the intelligence being exhib- 
ited by these autonomous robots hardly seemed worthy of the name. Yes, they learned 
by themselves how to avoid obstacles, how to recharge themselves in a charging station, 
or how to co-ordinate efforts to exploit the resources in the ecosystems we had built for 
them. But critical observers did not see much more than rat intelligence and they were 
right. The original goal of reaching human cognitive levels, as observable for instance in 
expert problem solving or conversations in natural language, remained elusive. The tra- 
ditional artificial intelligence approach of explicitly programming symbolic intelligence 
still gave far superior performance in tasks requiring cognition. Clearly, essential theo- 
retical concepts were missing for a truly bottom-up approach to succeed. 

In the summer of 1995, a clear breakthrough occurred. I was working as a visiting 
researcher in the Sony Computer Science Laboratory in Tokyo, invited by its director 
Mario Tokoro. Reflecting on our experiments from a distance, two new ideas occurred 
to me. First of all, language may have been the missing key in the initial experiments. 
Language may be a necessary route by which the human cognitive system bootstraps 
itself autonomously, in tight interaction with the environment and aided by a community 
of other language speakers. This suggested that if we wanted to have emergent forms 
of cognitive intelligence, we needed to go the same route. Second, the principles and 
mechanisms that had been pouring out of the study of complexity had to be relevant 
to understanding the origins and evolution of language, because they provided generic 

10 This approach is also known as situated cognition. Clancey (1997), see also: Varela, Thomson & Rosch (1991). 
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Figure 1.5: This ecosystem was designed together with David McFarland (Oxford Univer- 
sity) to explore emergent cooperation and competition. The robots in the form 
of lego vehicles could recharge themselves in the charging station (shown in 
the right top corner). But they had to ensure that there was enough energy in 
the charging station by dimming a light in black cylinders by pushing against 
them. 



explanations for how complexity may emerge. These principles include self-organisation, 
structural coupling, selectionism, level formation, and many others (Nicolis & Prigogine 
1989). The field of artificial life brings together researchers exploring the insights of 
complex systems with computational and robotic experiments (see: Langton 1995). 

Together with other researchers interested in the then-arising field of “artificial life”, 
I had already been simulating path formation in ant societies and other biological phe- 
nomena exhibiting an emergence of complexity. It dawned on me that the importance of 
these mechanisms for bootstrapping intelligence and language might be much greater 
than thought so far. 11 



ii 



The following reference provides a general survey of similar work in the area of lexicon formation: Steels 
(1997b). A representative sample of work on syntax is Briscoe (1999). 
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Back in Brussels, an extremely exciting but very intense period of research started as 
I tried to apply these principles to language and subject them to experimental scrutiny. 12 
We very quickly built a first prototype of a Talking Head with our own hardware and 
low-level software (see Figure 1.6) and experimented with language games on mobile 
robotic agents. 13 New fundamental insights and discoveries emerged almost daily. The 
first experiments, particularly in grounding language on real robots, were extremely 
difficult but we were clearly making steady progress. 




Figure 1.6: First prototype of a Talking Head camera with associated electronics built by 
Tony Belpaeme. The camera is capable to track moving images. 

As my research programme grew more radical, it became more and more difficult 
to get funding. European research and development programmes were increasingly 
demanding short-term projects that targeted information technology products already 
available on the American or Japanese market. Increasingly, my research proposals were 
being rejected and running projects were cut off, as reviewers could not see direct short- 
term commercial benefits. All this was endangering the further existence of my Brussels 
laboratory, whereas I, paradoxically, thought that our research had never been more 
promising. Fortunately at this critical moment Mario Tokoro helped me again in a cru- 
cial way. He understood what I was hoping to achieve and ensured secure and stable 
resources from the Sony Corporation. At the end of 1996, I consequently set up a new 
research structure in Paris, a spin-off from the Sony Computer Science Laboratory in 



12 The earliest papers on these mechanisms are in: Steels (1995) and Steels (1997a). 

13 The electronics and tracking software for this active camera were built by Tony Belpaeme. The mobile 
robot experiments were conducted with Paul Vogt. Other early work on the grounding and autonomous 
acquisition of language Tike communication systems by robotic systems is described in Steels & Vogt (1997). 
See also: Billard & Dautenhahn (1999). 
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Tokyo, where the bulk of the research reported in this book could be done in almost 
ideal circumstances. The Paris research was complemented with many important contri- 
butions from my graduate students at the VUB AI Laboratory in Brussels. 



1.5 Beyond Turing 

A scientific experiment creates, in a controlled and repeatable way, phenomena which 
shed light on similar phenomena observed in the natural world. Initially there is always 
some discussion about the relation between the artificially created phenomena and the 
natural phenomena. Galileo dropped cannon balls from the tower of Pisa and claimed 
that these experiments validated a general theory of falling bodies; but his contempo- 
raries objected that birds graceously landing on a roof can also be seen as falling bodies, 
so how general was his theory really? 

When dealing with cognitive phenomena such as language and meaning, the same 
problem arises. To what extent do the languages constructed by the Talking Heads count 
as languages? Does it make any sense to say that these robotic agents categorise their 
world? Do the Talking Heads learn? Do they genuinely understand each other? Do they 
understand us? To what extent is there a real increase in syntactic complexity? Intense 
discussions about whether cognitive phenomena can be recreated in artificial systems 
have raged since researchers have been exploring this route, and they have been shown 
extremely difficult to resolve to everyone’s satisfaction. 

It seems unavoidable that there is disagreement because judgements about cognitive 
capabilities are to some extent subjective. Most people ascribe more intelligence to their 
pets than a casual observer is willing to admit. In any case, judgements should clearly 
rest with humans, and definitely not with the designers who are all too keen to call their 
systems intelligent. This was clearly recognised by Turing when he devised his famous 
Turing test . 14 Turing started from a popular societal game of his time, where an observer 
had to tell through a dialog whether he was talking to a man or a woman. He proposed 
to call a computer program intelligent when it was capable of playing the role of a man 
or woman so well that the observer could no longer tell whether a person or a computer 
program was playing the game. The Turing test has rightfully been criticised as being 
on the one hand too difficult, because it is completely open-ended, and on the other 
hand too narrow, because it does not incorporate important aspects of intelligence such 
as learning or sensori-motor intelligence. The only way to have a decent showing in 
the Turing test is to cheat, i. e. to let the computer mimic intelligence by manipulating 
symbolic patterns without any notion of what they mean. It is therefore desirable to 
have an alternative set up which still preserves some of Turing’s original ideas. 

The goal of the Talking Heads experiment is not to demonstrate an artificial intelli- 
gence with the same capacities as human intelligence, but to perform scientific experi- 
ments so as to examine aspects of a theory of the origins of language and meaning. How- 
ever, as in Turing’s proposal, the public should be the ultimate judge whether cognitive 



14 The Turing test is originally described in Turing (1950). 
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phenomena are taking place, and the artificial agents should be able to play their role in 
interaction with humans. Sound scientific methodology requires that the experimental 
apparatus is a white box which can be fully probed by anyone who wants, that the ex- 
periments are repeatable, and that the phenomena that are generated (for example, the 
complexity of the lexicons or grammars) can be compared by anyone to human cognitive 
phenomena in order to gage their similarity and thus their relevance for understanding 
human cognition. 

In 1999, a golden opportunity presented itself to expose our theories and systems to 
public scrutiny and thus solicit judgements from a wide range of human observers. Bar- 
bara Vanderlinden and Hans-Ulrich Obrist, two internationally renowned young cura- 
tors, had been put in charge of an important art event in the city of Antwerp (Belgium) 
and made the brilliant decision to organise a confrontation/co-operation between art 
and science. 15 They invited artists and scientists to set up a public laboratory in the city 
and conduct experiments from the viewpoint of their discipline. This is how the Talking 
Heads experiment came to be installed in a public space in Antwerp and how thousands 
of people, some more bewildered than others, took part in the first ever large-scale pub- 
lic experiment in artificial intelligence. Everybody was encouraged to interact with the 
robots and to try and understand what was going on. This was not an obvious thing 
to do because the Talking Heads construct their own language and their own conceptu- 
alisation of the world. Understanding what they are talking about resembles the work 
of an anthropologist who is studying the language and conceptualisations of a newly 
discovered tribe living secluded in the rainforest. 

The Laboratory for cognitive robots and teleportation that housed the Talking Heads 
experiment also contained a documentation room in which the audience could get ad- 
ditional background information and provide feedback and commentary on the exper- 
iment. We created a website that was accessible worldwide through the Internet. This 
allowed viewers from anywhere in the world to follow the dialogs, inspect the lexicons 
and ontologies (sets of perceptually grounded concepts) of the robots, and even interact 
remotely with the physical robots, playing language games and doing their own experi- 
ments. We also added other physical sites in Tokyo, Brussels, Paris, Amsterdam, San Jose 
(US) and other places, to increase the environmental complexity that the agents could 
experience. 

The massive response and thoughtful judgements of the public were crucial to validate 
many aspects of the theories put forward in this book. But the interactions with a broad 
public and the intense discussions that it generated also added new dimensions to the re- 
search. First of all, a whole new type of interface between man and machine was taking 
shape under our very eyes. In contrast to pre-programmed computer interfaces, which 
more often than not make it difficult to do what one wants, the Talking Heads demon- 
strated for the first time the concept of negotiated user interfaces. The interaction 
was based on mutual respect and adaptation of man and machine. Communicative fail- 
ure was not fatal but an opportunity to fine-tune and negotiate the way communication 
would take place in the future. 



15 The catalogue of this event (Obrist & Vanderlinden 1999) gives an idea of the other laboratories and the 
coming together of art and science. 
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Secondly, the Talking Heads experiment turned out to be an ideal learning environ- 
ment for raising philosophical issues. Children and adults alike started to ask questions 
about the nature of meaning, the relation between language and reality, the mind-body 
problem, the origins and evolution of language, consciousness, social identity, and so on. 
They started to play language games among themselves and some of them reported pro- 
found changes in the way they think about language. If these reflections create greater 
tolerance towards other languages and the conceptualisations of the world they implic- 
itly embody, then I consider the Talking Heads experiment of great societal value, irre- 
spective of the scientific and engineering breakthroughs the project has generated. 



1.6 The book 

This book describes in detail the rationale behind the Talking Heads experiment, the 
mechanisms that make it all work, the ontologies and languages the agents develop, 
and what happened when the Talking Heads were exposed to public scrutiny. It studies 
processes which must already be active in the very first stages of language use in the 
child, and must also have been present in the early phases of human language genesis. 

The main text contains the principled line of the argument in a form which is intended 
to be generally accessible without compromising exactness. The notes after each chapter 
contain references to other work, as well as details or additional material of relevance 
to the specialist. Each chapter also contains a set of references to literature on the same 
topic. 

This book starts with a preview of the experiment and a brief illustration of the core 
ideas (Chapter 2). I then cover the different tasks step-by-step that speakers and hearers 
must carry out: perception (Chapter 3), conceptualisation (Chapter 4), and lexicalisation 
(Chapter 5). Each chapter discusses the architectural components with which the Talking 
Heads have been endowed and the kinds of cognitive structures that the agents generate 
in interaction with the environment and the other agents. Chapter 6 and Chapter 7 then 
bring all these results together and tests the rich complex semiotic dynamics that arise 
when the Talking Heads effectively interact with real world environments. 

The research discussed in this book is far from finished. It is still science in the making. 
Many cognitive structures and capabilities, even very elementary ones arising during the 
first years of human life, are still unveiled. Many language issues have not been covered 
yet. The current set-up shows only inklings of what future forms of man-machine inter- 
action might be like. Nevertheless, I believe that the hypotheses proposed in this book, 
and the methodology of experimentation that has been used to explore these hypotheses, 
open new venues for a scientific understanding of the human mind. Building artificial 
systems that exhibit cognitive capabilities does not de-humanise the mind, in the same 
way as the telescope does not demystify the cosmos. We see much more and what we 
see is infinitely more beautiful and impressive. 
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When people see the Talking Heads for the first time, they are stunned. It takes a while 
before one gets used to the self-generated movements of each robot, the strange dia- 
logues in an incomprehensible language, the graphs plotting the evolution of their inter- 
nal states, and the colourful environment which is the subject of their language games. 
But after some time, almost everyone gets involved in the game and attempts to figure 
out the language the robots have developed or to teach them his own. Some people come 
back day after day to follow the progression of the language and conceptualisations that 
the Talking Heads build in collaboration with interacting observers. Children are the 
first ones to start playing with neither fear nor preconception. 

This chapter explains the general setup of the experiment and gives a rough idea of 
what is going on. The various principles and mechanisms at work will be discussed in 
more detail later and I will also give many more examples taken from concrete interac- 
tions. 



2.1 The main components 

Clearly in the development of language and meaning, the group and the environment 
matter. A child who grows up without a caring community or without sufficient envi- 
ronmental stimuli never develops the rich cognitive capacities normal adults have. From 
attempts to educate wolf children who grow up in isolation from a human community, 
or impaired children for whom the intensity of early interactions are limited, we know 
that there are critical periods where a community and a challenging environment must 
be present otherwise the child’s capacities for language are damaged for the rest of his 
or her life . 1 

But how can we sufficiently recreate these social conditions in experiments with ar- 
tificial systems? Building colonies of physical autonomous robots roaming the world in 
search of stimulating environments and rich interactions with other robots is not feasi- 
ble today. So how can we ever test seriously situated and socially embedded approaches 
to cognition? 

2.1.1 Teleporting 

Let me make a distinction between the physical aspects of a cognitive agent and the 
mental aspects. The physical aspects include the agent’s body, the sensors and articula- 
tors, the physical location, the objects in this location, and the other agents physically 



1 See Tager-Flusberg (1994). 
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present in the environment. The mental aspects include the agent’s repertoire of be- 
haviours, the brain structures and processes performing categorisations of reality, his 
memory, lexicon, grammar, and so on. In the case of humans, these two aspects are 
intimately connected and indivisible. We cannot teleport our mental faculties into an- 
other body, or into another copy of our body, even though there have been speculations 
that we could in the future record human brain states, 2 I believe that this will still not 
enable teleportation because in human brains there is no distinction between hardware 
and software. The architecture of a human brain, the physical connections between cells, 
and the biochemical processes in each cell determine the brain’s behaviour. There is no 
separation between a brain program and an interpreter that reads brain programs and 
executes them. The brain is a special-purpose hardware device which is unique to each 
individual. To copy such a device we would have to rebuild it physically, atom by atom, 
and integrate it in an exact copy of the same body. 




antwerp brussels 



Figure 2.1: The Talking Heads agents are implemented as software entities that can travel 
over the internet. To play a game, they get downloaded in a local server that 
drives the cameras and orchestrates a game. After a game, the software state 
is uploaded again to travel towards another location. 

However, in the case of computer-based artificial agents, we can make the distinction. 
It is possible to capture the mental state of an agent in software, load this into a physical 
body, and then operate the agent. Afterwards the agent can extract himself again from 
the body, teleport himself to another physical location through a data transmission net- 
work like the Internet, get instantiated there in another body, and experience another 
reality and physically meet other agents. This is exactly how we have implemented the 
Talking Heads experiment. 3 There are on the one hand the physical structures, which I 



2 Such visions of the future have been put forward by Moravec (1995). Neither current artificial intelligence 
technology nor the state of the art in brain state recording are anywhere near to realising these visions. 

3 The agent teleportation infrastructure is in itself a fascinating non-trivial engineering project. Contribu- 
tions from Angus McIntyre, Alexis Agahi, Sylvere Tajan and Frederic Kaplan are gratefully acknowledged 
(McIntyre, Steels & Kaplan 1999). The fact that agents can teleport proves that we are dealing with a 
truly distributed multi-agent system. It also introduces physical parallelism in the agent-agent and agent- 
environment interactions. For a general introduction into multi-agent system technologies and design 
methodologies, see Ferber (1998). 
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will refer to as the robot bodies. They are installed in different physical locations some- 
where in the world and connected with each other through the Internet. Then there 
is a population of software structures that are occasionally loaded and instantiated in 
specific robots. I will call these software structures virtual agents. A real agent (a 
Talking Head) only exists when the virtual agent is loaded in a physical robot body. 

Virtual agents cannot interact and an interaction between two “real” agents can only 
take place when they are both physically present in the same location. Thus an agent 
can travel from Paris to Tokyo at the blink of an eye, rather than having to take a plane, 
but two agents can only interact when they are instantiated in the same physical envi- 
ronment. It is in principle possible that agents in different physical locations describe to 
each other the environment that they see (but which the other one does not see), just 
as we would do in a telephone conversation. This is only possible, however, after the 
agents have had sufficient interactions with each other in a shared physical world to 
have developed and learned a grounded shared language. 

The teleporting setup enables fascinating experiments, engulfing the whole globe. The 
same agent can look at a scene from different points of view or at scenes in different 
physical locations in the world by teleporting himself in different bodies. He can de- 
velop categories in one location and enrich his learning experience by moving to a new 
location which has different objects and thus poses new categorial challenges. There can 
be populations of varying sizes with new agents being born and older agents dying, just 
like in natural human populations, so that we can study the transmission of language 
from one generation to the next or the resilience of a language against an influx and 
outflux of agents. We can also let agents develop in different parts of the world and have 
them migrate to study intercultural exchange and language contact. 

2.1.2 The robots 

A blind person who receives sight after the critical period for acquiring visual categori- 
sation undergoes a traumatic experience, see Zeki (1993). We could in principle build 
robot bodies through which agents can experience their world with tactile sensing and 
other robot bodies which support visual sensing, or both. But then an agent which had 
only access to tactile sensing might suddenly find himself in a body equipped with vi- 
sion. This pathological complication will be avoided by using the same robotic infras- 
tructure in every location, even though it would be fascinating and technically possible 
to study multi-modality in its own right. We also decided to make the robots vision- 
based, because visual sensing is one of the major sources of meaning in human natural 
languages. 

Concretely, each robot consists of five building blocks (see Figure 1.2): 

• A camera mounted on pan/tilt motors so that it can move up or down and left or 
right. 

• A loudspeaker (for voice output) and a microphone (for voice input). Each agent 
has a particular quality of voice output with male and female voices, so that it is 
possible to keep them apart in the dialogue. 
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• A computer that can house an agent’s cognitive architecture as well as peripheral 
control software to steer the movements of the camera, receive and preprocess im- 
ages, or synthesise and analyse sound. This computer is connected to the Internet 
to allow virtual agents to be loaded and instantiated. 

• A television screen that shows us what the agent instantiated in a body currently 
sees through the camera. 

• A computer screen that shows us what is going on inside the brain of the agent 
currently installed in the robot (Figure 2.2) 




Figure 2.2: Interface through which the internal states of agents can be inspected. Two 
agents are shown. The top windows show the state of an agent, the middle 
windows the camera inputs that the agent sees and the bottom windows show 
their discrimination trees. 

In constructing the robot bodies, we have used as much as possible off-the-shelf stan- 
dard components so that we could focus almost completely on issues directly relevant 
to language and meaning. Each robot’s low-level vision system (integrated in the cam- 
era) is already very sophisticated. 4 It can focus automatically to get a sharper image 
and autonomously track a moving object. The speech signal is produced with a stan- 
dard text-to-speech system so that we did not have to worry about building complex 
audio modules ourselves. The computer hardware is powerful, but not specialised nor 
in the supercomputer range. All programs have been written in the standard symbolic 

4 The camera is a Sony EVI-D31. The main computer is a Power Macintosh from Apple, Inc. The agent servers 
run under the Linux operating system. 



22 



2.1 The main components 



programming language of artificial intelligence research, namely lisp . 5 What makes the 
Talking Heads experiment special is not the hardware or software tools but what we 
have done with it. 

2.1.3 The agents 

Agents can only engage in interactions with other agents when they are physically in- 
stantiated in a robot. Each agent has a basic brain architecture with different layers 
performing the cognitive functions relevant for playing language games: 

• A perceptual layer which performs low-level signal processing to segment the 
image and collect data about each segment such as the colour, size, position or 
shape of a segment. 

• A conceptual layer which categorises and conceptualises the segmented and pro- 
cessed image. It is based on a self-generated and evolving repertoire of categorial 
distinctions, such as red versus green, or small versus large. Such a repertoire is 
referred to as the agent’s ontology in this book. 

• A lexical layer which maintains an evolving repertoire of associations between 
meanings and words, which I will refer to as the lexicon, and performs lexical 
lookup while parsing or producing utterances. 

• A syntactic layer which uses grammatical schemata for organising words in larger 
structures or for recognising these structures and reconstructing complex mean- 
ings. 

• A pragmatic layer which carries out the scripts for playing language games and 
maintains the machinery for engaging in interactions with other agents in a shared 
environment. 

Each of these layers is described in more detail later. The internals of the layers are not 
static but constantly evolving and adapting. They are not strictly modular but coupled 
in various ways to each other. Each verbal interaction in effect changes the agent’s 
internal state and thus influences future behaviour. A new, virgin agent starts without 
any built-in ontology, lexicon, or grammar. This is one of the crucial points of the whole 
experiment, because we want to test possible theories on how language and meaning 
can evolve and be acquired ab initio. 

Agents are part of populations which determine the probability with which they en- 
counter each other. This generates a dynamic process at two levels: There is the dynam- 
ics of the evolving cognitive competence of each agent (the ontologies, the lexicon, the 

5 The construction of the programs underlying the Talking Heads experiment has required advanced artificial 
intelligence programming techniques, such as discussed by Norvig (1992). A general toolkit for the system- 
atic execution of simulation and physical experiments, called babel, has been designed and implemented 
by Angus McIntyre. The toolkit allows the definition and modular composition of cognitive architectures, 
the design of experiments, and the monitoring and displaying of results, see McIntyre (1998). 
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grammar), and there are the evolving macroscopic structures which arise in a popula- 
tion of agents, such as the common lexicons or shared grammars. We will see that the 
mental characteristics of agents, even in a single population, are never identical because 
each agent has his own history of interactions with the environment and with other 
agents. This is another crucial aspect of the experiment. We want to investigate in how 
far communication is possible without complete ontological or linguistic coherence. 

2.1.4 Interactivity 

To qualify as a sound scientific experiment, anybody who wishes to challenge the claims 
should be given the tools “to see for him- or herself”. There are three ways in which we 
have empowered observers to do so. 

First, each physical Talking Heads site has a complex infrastructure to organise the 
interactions between agents operating in that location, and to support the arrival and 
teleporting of agents. This infrastructure also houses a commentator, a computer pro- 
gram that monitors the dialogues, inspects the internal states of each agent, and displays 
useful statistics such as the degree of sharing of the lexicon, the competition between 
different words to express a particular meaning, the stability of certain syntactic con- 
structions, etc. The commentator produces spoken or written comments and displays 
measurement results on an additional computer screen. 

Second, the teleporting infrastructure makes it possible to implement interactions be- 
tween humans and artificial agents, either directly in the shared physical environment 
or through the Internet. At any time, a human experimenter can pretend to be one of the 
agents: seize a robot, partly control the camera to set the context of an interaction, and 
type in expressions playing the role of speaker or hearer in a language game. The human 
experimenter can create a new, virgin agent, track in detail how this agent acquires the 
categories and language in an existing group, or try to influence the currently dominat- 
ing language by introducing new words or constructs and following their propagation 
(see Figure 2.3). 

Finally, the environment has been restricted to increase the transparency of experi- 
mental results. It consists in all locations of a magnetic white board mounted on the wall 
in front of the robots (see Figure 2.4). On this board, the human experimenter can paste 
various figures, typically stylised geometric figures like rectangles, circles, and squares, 
in various sizes, shapes and colours. By changing the environment, the experimenter 
can try to find out what visual categories the agents employ and force the expansion of 
categorial repertoires, for example by pasting new types of figures on the board. He can 
probe the adaptivity of the agents by setting up situations that destabilise an existing 
lexicon and see how long it takes before a new, perhaps more abstract lexicon emerges. 
All these tools generate unprecedented opportunities to apply the most rigid scientific 
evaluation criteria to the theories of language and cognition that I will propose in this 
book. 
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Figure 2.3: Internet interface through which users can access the state of games on re- 
mote sites and follow the experiment. 




Figure 2.4: The physical environment of the Talking Heads consists of a white board on 
which various geometric figures can be pasted. The light conditions were not 
under the control of the experimenters in most locations. 
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2.2 The Guessing Game 

Given this rich computational and robotic infrastructure, many specific experiments are 
possible. Each experiment could explore a particular interaction between the robots, the 
environment, and human observers. In this book, I explore only one type of interac- 
tion, which I call the Guessing Game. 6 The Talking Heads play this game either among 
themselves or with a human experimenter. 

2.2.1 Rules of the game 

The Guessing Game is played between two physically instantiated agents. Agents in 
a virtual state must queue up to have access to one of the robot bodies installed in a 
particular site before they can play the game. One agent plays the role of speaker and 
the other then plays the role of hearer. Agents take turn playing games so that all of 
them develop the capacity to be speaker or hearer. A human experimenter can pick one 
of these roles and play the game instead of an artificial agent. 

The speaker first looks at one area of the white board and directs the attention of the 
hearer to the same area. 7 The objects located in this area constitute the context. The 
speaker then chooses one object from the context, which I will call the topic, and gives a 
verbal hint to the hearer. The verbal hint is an expression that identifies the topic with 
respect to the other objects in the context. For example, if the context contains a red 
square, a blue triangle, and a green circle, then the speaker may say something like the 
red one to identify the red square as the topic. If the context contains also a red triangle, 
he has to be more precise and say something like the red square to delineate the topic 
from the red triangle as well as from the blue square. Of course, the Talking Heads do 
not say the red square but use their own language and concepts which are not necessarily 
the same as those used in English. For example, they may say malewina to mean [upper 

EXTREME-LEFT LOW-REDNESS]. 

Based on the verbal hint, the hearer tries to guess what topic the speaker has chosen, 
and communicates his choice to the speaker by pointing to the object. Given that the 
robots do not have arms, pointing is realised by focusing on an object. One robot can 
“see” in which direction another one is looking, and thus know where he is “pointing”. 
The game succeeds if the topic guessed by the hearer is equal to the topic chosen by the 
speaker. The game fails if the guess was wrong or if the speaker or the hearer failed at 
some earlier point in the game. 

In case of a failure, the speaker gives an extra non-verbal hint to the hearer by himself 
pointing to the topic, and both agents try to repair their internal structures to be more 
successful in future games: The speaker weakens his hypothesis that the words and con- 



6 A very similar game has been called the original language game by Brown (1973) in the context of research 
on child language acquisition. See also the thoughtful analysis in Halliday (1987). Research on child lan- 
guage has inspired the agent architectures and behaviours but they should not be seen as a realistic model 
of child language acquisition. 

7 1 will explain later how exactly agents decide on a particular scene and how they are able to draw each 
other’s attention to specific areas of the white board. 
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structions he used were correct, in the sense of shared by other agents. The hearer tries 
to guess what meanings the speaker might have used and deduce what form-meaning 
relations or syntactic constructions he is missing. Pointing by gesturing is always vague, 
and the repair actions are far from guaranteed to succeed. Nevertheless, they gradually 
lead (as we will see) to a sufficiently shared communication system meaning that suc- 
cess in guessing the topic purely based on language communication increases to reach 
almost 100%. 

2.2.2 Nature of the game 

The Guessing Game is one of the common things we do with language. For example, I 
play a similar game when I sit with a friend at the dining table and say could you give 
me the salt. If she guesses correctly what I mean and hands me the salt, the game has 
succeeded. If she looks at me with a puzzled face (maybe she does not speak English) 
or hands me the salmon instead of the salt, the game has failed. In that case, I can 
gesture in the direction of the salt and say no, no, the salt please, and then she hopefully 
realises and gives the salt to me. Failure is common in natural language dialogues and 
may be caused by many factors. For example, the salmon could indeed be close to the 
salt and my pronunciation of the word salt may have sounded a bit like salmon, perhaps 
because there was loud music playing in the background or perhaps my friend does not 
understand English. A failure is often an opportunity to negotiate how something will be 
expressed in the future. For example, the hearer may pick up a new word or the speaker 
may realise that a certain word is not appropriate in this particular context. 

The Guessing Game is not a game of winners and losers because both agents win or 
both agents lose at the same time. But it is a game nevertheless, because it is played 
with clear rules, with a clear outcome and strict limitations on how success can be 
achieved. An agent can not look inside another agent’s brain state. Agents can only 
interact through the external environment. There is no global control center that is mon- 
itoring the behaviour and internal states of all agents and setting the way they should 
speak or perceive their world. The artificial agents are autonomous and fully distributed, 
just like human beings. 

The game is different from a closed world game like chess, because the environment 
is open. The human experimenter may introduce new objects at any time, any one agent 
can extend the language (for example invent a new word), possibly requiring the other 
agents to adopt it as well, or the human experimenter may inject new language forms 
in the dialogues. Agents try to maximise their communicative success by cooperating to 
the fullest and update or change their internal structures and processes to improve their 
chances in the game. 8 



8 The Guessing Game is a cooperative game, because both the agents win or loose at the same time and 
have the highest gain if they develop co-ordinated behaviour. The agent who manages to have the most 
success in the game is the global winner. Because the commentator requires to know from the speaker 
which topic he wants to communicate before he is allowed to speak, no cheating is possible. Game theory, 
originally founded by John von Neumann and Oskar Morgenstern, can be applied to study the language 
game mathematically. We are dealing with an evolutionary game in which the players optimise their 
internal states to become better in the game, see Maynard Smith (1982). 
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In the Talking Heads experiment, it is assumed that agents want to cooperate and that 
they use communication as part of their cooperation. The evolution of cooperative games 
has been studied extensively by artificial life researchers often in the context of Robert 
Axelrod’s prisoners dilemma game, see for example Ikegami (1994). For a general intro- 
duction how communication can evolve in the context of cooperation, see Hauser (1996). 
Computer simulations showing the evolution of cooperation and communication have 
been reported in the artificial life literature. See for example MacLennan & Burghardt 
(1993). Most of these computer simulations are closer to animal signaling systems than 
to human lexicons both in size and in terms of the complexity of meaning. 

The Guessing Game is clearly not the only thing we do with language. Humans are 
capable of playing a whole range of language games and inventing new ones when the 
circumstances require it; however, to do controlled experiments we need to limit our- 
selves. The objective of the Talking Heads experiment is not to cover the full range and 
complexity of human natural language interaction but to examine with objective preci- 
sion a limited number of issues concerning the nature of language and meaning. 

2.2.3 The semiotic square 

The environment of the Talking Heads is not fixed. The human experimenter may change 
the position of objects, add new kinds of objects, or eliminate others. Consequently a 
strategy of naming individual objects will not work. It would lead to a proliferation of 
proper names and it would require the Talking Heads to recognise objects, which is very 
difficult to do. 9 Indeed, humans don’t exclusively use proper names in natural language 
conversations either. We say could you give me the red small square as opposed to could 
you give me 0_143. Natural language words like red or small name perceptually grounded 
categories and syntactic structures indicate how they should be combined and used to 
find the topic. The relation between a language expression and its referent is therefore 
always indirect. This is summarised in the semiotic square (Figure 2.5), which I will use 
throughout this book to help understand and analyse the nature of language communi- 
cation. The semiotic square relates the four entities involved in a verbal interaction: 

• An utterance, such as small red square, which is transmitted as a physical signal 
from one agent to another one through the external environment. It is written 
between double quotes. 

• A meaning, which consists of categories like [red], [small], or combinations of 
categories, like {[red] [small]}. Labels of categories are written in capital letters 
between square brackets. 



9 For a thorough exposition of the difficulties of object recognition, see Ullman (1996). Object constancy 
comes fairly late in the acquisition of a child’s ontology, as Piaget’s conservation experiments have shown. 
Language probably plays an important role in forming the notion of an object. 
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• An image segment, denoted as S143, which is a segment of the image perceived 
through the camera. 

• A referent, like object 0143, which is an entity in the real world. 



Image segment < ^ 

A 



Referent 

A 



* 

Meaning <■ 



v 

Utterance 



Figure 2.5: Any verbal interaction involves four entities here grouped in the semiotic 
square. The relation between utterance and referent always needs to be estab- 
lished indirectly by passing through perception and meaning. 

The systematic relation between meaning and referent is usually studied under the 
heading of semantics, and the systematic relation between meaning and utterance as 
grammar (including syntax, morphology and lexicon). 10 

Many tricky philosophical issues are raised in the unavoidable distinction between the 
sensed image of an object (which is local to the agent) and the object itself (which is exter- 
nal to the agent). Some philosophers even doubt that objects have an existence outside 
our perception of them! We need to make the distinction because agents always have 
different internal images even if they look at the same object seen from our viewpoint 
as an external observer. However, for simplifying the explanations, I will sometimes as- 
sume that perceived image and external object are the same, so that the semiotic square 
becomes a semiotic triangle. 11 

10 For a general introduction to the contemporary linguistic viewpoint on the processes involved, see Van 
Valin jr & LaPolla (1997). In a logical approach to language, as exemplified by Montague grammar (Mon- 
tague 1974), meanings are represented using a logical formalism, i.e. a variant of intensional logic. Natural 
language expressions are systematically related to expressions in this logic, and a formal semantics sys- 
tem defines how expressions in the logic are mapped onto their denotations. Because this is a formal 
framework, the denotations consist of formal models. To make the Talking Heads experiment work, we 
needed to develop a grounded semantics system, which details how an agent may go from physical reality 
to meaning using a perceptual apparatus, and from meaning to physical reality. The logical structure of 
the meanings we will investigate are very simple (unary predicates and conjunctions). But once we know 
how to set up a grounded semantics for simple meaning structures we can scale it up to the more complex 
meaning structures typically studied in logic. 

1 The notion of a semiotic triangle was first introduced in a classic of the semiotic literature, see Ogden & 
Richards (1935). 
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When we put together the semiotic squares of two agents (Figure 2.6), we see more 
clearly that agents are trying to agree about a common object in the external word, but 
they never have any direct access and hence confirmation whether they are really refer- 
ring to the same object. Only through pointing or other cooperative actions can speaker 
and hearer co-ordinate whether they indeed refer to the same object in the external re- 
ality. 

The utterance is not the same for both agents, because it needs to be articulated, trans- 
mitted, and perceived through a physical medium. Errors in transmission or perception 
may and do occur and have an important impact on the evolution of language. To remain 
focused, I will not treat this issue in depth in this book but will instead assume that there 
is direct, error-free transmission of the utterance. 

2.2.4 Processes involved in language communication 

I will use the following terms for denoting the processes speakers and hearers go through 
while traversing the relations in the semiotic square as they play a language game (Fig- 
ure 2.6). A similar framework, but emphasising the language production side, has been 
described in great detail by Levelt (1989). This book also provides a wealth of psycho- 
logical evidence that these processes must be going on and expands the phonetic and 
phonological side. An example of a detailed architecture inspired by generative gram- 
mar is discussed in: Jackendoff (1997). Generative approaches to language attempt to 
define a language by generating its set of possible utterances. Interpretations are con- 
structed from the syntactic structure derived by the generative grammar. In this book, 
we are interested in the mapping from communicative intent in a perceived reality to an 
utterance and back. The knowledge and skill needed to solve this problem is different 
from that need to systematically generate the set of sentences in a language and their 
possible interpretations. 

1. The speaker as well as the hearer perceive reality by capturing an image through 
the camera, segmenting the image into coherent units, and deriving various sen- 
sory characteristics about each image segment, such as the colour, size, movement. 

2. The speaker must then conceptualise the scene on the basis of this perception. 
He must find a set of categories or a conjunctive combination of categories that 
distinguishes the referent from the other objects in the context, and which will act 
as the meaning of his communication. For example, he might choose [blue] if the 
topic has a blue colour and all the other objects in the context are not blue. 

3. The speaker must then verbalise this conceptualisation. He must use his language 
system to find words and syntactic constructions expressing this meaning. For 
example, he might choose blauw (if he speaks Dutch), or bleu (if he speaks French), 
to convey the category [blue] . 

4. The hearer must engage in similar tasks but now going in the reverse direction. 
He must interpret the utterance to find out which conceptualisation constitutes 
the meaning. 
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Figure 2.6: Left: processes carried out by the speaker. Right: processes carried out by 
the hearer. There are also feedback processes moving in alternate directions 
until the agents settle on coherent choices for all the items in their semiotic 
squares. 



5. Then the hearer must apply this meaning to see what referent was intended. The 
hearer has also perceived the scene in terms of a set of segments and now uses 
the meaning to identify the segment that could have been the one intended by the 
speaker. 

6. Finally the hearer acts upon the outcome of meaning application. He points to 
the topic he has identified. This is the step where both agents co-ordinate their 
behaviour through the external world. 

2.2.5 Knowledge sources and competences 

Each of these activities requires knowledge and/or skill (summarised in the table below). 
Perceiving requires visual processes capable of segmenting images and deriving image 
segments. Conceptualisation requires an ontology, a repertoire of perceptually grounded 
distinctions that can be applied to a segmented image to yield distinctive categories or 
category combinations that may constitute the meaning of the utterance. Verbalising 
a conceptualisation requires a lexicon that maps parts of the meaning to words and a 
syntax that specifies how to organise individual words into a larger complex. The hearer 
must use similar knowledge sources in the other direction. He must use his lexicon 
and syntax to reconstruct the meanings expressed by the utterance, and then use the 
ontology again to apply the meaning to the present context to find the referent (see 
Table 2.1). 

There is no simple linear flow from perception to utterance or from utterance to per- 
ception. Instead, we must imagine a dynamic process involving forward and backward 
propagation of information until coherent choices for all the nodes in the two semiotic 
squares have been established by the speaker and the hearer. Many different choices 
are initially possible (many segmentations, conceptualisations, verbalisations, interpre- 
tations), but the dynamic process gradually settles into a single coherent attractor, so 
that speaker and hearer agree upon a common referent. The co-ordination between cho- 
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Table 2.1: Activities and knowledge soruces 



Activity 


From 


To 


Knowledge source 


Perceive 


Object 


Image segments 


Visual processes 


Conceptualise 


Topic 


Meaning 


Ontology 


Verbalise 


Meaning 


Utterance 


Lexicon + syntax 


Interpret 


Utterance 


Meaning 


Lexicon + syntax 


Apply 


Meaning 


Topic 


Ontology 


Act 


Topic 


Object 


Behavioural process 



sen topic and identified referent is done through the real world (pointing, handing an 
object, performing an action). 

I deliberately left an important aspect of language out. In the case of physical agents, 
the form cannot be transmitted directly but needs to be articulated in speech sounds, 
written signs, or gestures, to create a true utterance. This additional complexity will 
not be discussed further in this book - even though it is a fascinating topic in its own 
right. 12 To make a verbal interaction nevertheless complete, agents are given a reper- 
toire of consonants and vowels with which they can make random syllables and syllable 
combinations, like wabido, bimaku, etc. The articulation and recognition of these sylla- 
bles is assumed to be acquired already and transmission is engineered to be error-free. 
This way, our attention can be focused on how ontologies, lexicons, and grammars may 
emerge. 



2.3 Perception and categorisation 

I will now discuss in some more detail each of the processes the Talking Heads go 
through when they play a complete language game, leaving a more detailed discussion 
to subsequent chapters. 

2.3.1 Scene and topic selection 

Every physical Talking Heads set-up features a waiting room in which agents are stored, 
ready to be loaded into the robotic infrastructure or to be teleported to another physical 
site. A game starts when two agents are chosen at random from this waiting room and 
loaded into the two physical robots. The internal architecture of each agent is connected 
with the sensori-motor apparatus of the robot so that they can receive the sensory data 



12 See for a state of the art review: Clark & Yallop (1995). We have already been conducting quite extensive 
research in our group on the origin of sound systems using similar principles as the ones discussed in 
this book for the origins of word semantics. See de Boer (2000). The complex adaptive systems approach 
underlying this phonetics work was already foreshadowed in work of phoneticians like Liljencrants & 
Lindblom (1972). 
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streams recorded by the camera and control the movements of the robots. Then the 
general control system assigns randomly the role of speaker and hearer to each of the 
agents and the game can begin. 

Next the speaker randomly moves around its camera, halts at a particular location, 
and captures the image. The speaker then attempts to segment this image. If the scene 
is interesting, which means that there are at least two clear segments, it is chosen for 
playing a language game. Otherwise the agent makes another random movement and 
repeats the process. An example of an interesting scene with its subsequent segmenta- 
tion by the speaker is shown in Figure 2.7 (top image on the right). The scene has two 
circles as main objects. Segments which are too small are ignored. 
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Figure 2.7: Three examples of segmented images (from top to bottom). Left and right im- 
ages show the perceived and segmented images for two agents respectively. 
These images are always different because they are dependent on the position 
of the agent. The topic is indicated by a dashed bounding box in the image 
of the speaker (on the right for the first case and on the left for the others). 
Segments which are too small are ignored. The topics have all been concep- 
tualised as being to the right and so the same word gofubo has been used to 
successfully refer to them. 

Segmentation can happen according to several criteria. For example, patches with sim- 
ilar colour can be grouped into a single patch, edges can be identified and then linked 
with each other to form line segments circumscribing the contours of an object, or con- 
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secutive images can be compared to extract the parts that changed and hence moved 
as a single object. There is now a solid body of techniques from decades of machine 
vision research to efficiently segment scenes according to these and other methods. The 
Talking Heads use several methods in parallel and combine their output to get a clearly 
segmented picture. 13 

The hearer must be able to sense in which direction the speaker is looking. This facility 
is at the moment implemented by having the speaker indicate to the hearer the point on 
the white board at which its camera is focused. The hearer then moves the camera to this 
point and records an image as well. The hearer segments this image (see Figure 2.7, top 
image on the left) so that now both the speaker and the hearer have a set of segments and 
can start playing a language game. Note that the speaker and hearer never get exactly the 
same image because they are standing about one metre apart from each other before the 
white board. Also the calibration is never entirely accurate so that perceptual differences 
are unavoidable. 

2.3.2 Sensory channels 

Next low-level visual processes gather information about each segment, such as its av- 
erage colour, size, shape, the position with respect to horizontal or vertical axes. Each 
process outputs its information on a sensory channel scaled between 0.0 and 1.0. The 
mechanisms from the conceptualisation layer that subsequently use this information can 
operate on any kind of sensory channel. I will assume for the rest of this chapter that the 
low-level routines produce each only three types of information, sent on the following 
sensory channels: 

• The first channel is called hpos (horizontal position). It contains the x-midposition 
of a segmented object within the field of view of the robot. 

• The second channel is called vpos (vertical position). It specifies the y-midposition 
of the segmented object. 

• The third channel is called gray and contains the average gray-scale of the object. 

Later on additional channels will be introduced. 

Consider the two objects in Figure 2.8. The triangular object has the (scaled) values 
hpos=0.35, vpos=0.40, gray=0.33, and the rectangular object has the values hpos=0.70, 
vpos=0.85, gray=0.33. The agents can already visually distinguish millions of possible 
scenes with these three sensory channels. The number of scenes quickly grows when 
the set of sensory channels increases. 

The low-level visual processes outputting values on the various sensory channels are 
already quite complicated in themselves. I will discuss them further in Chapter 3. 1 will 
then argue that the agent’s repertoire of visual processes does not have to be static and 
programmed in advance but evolves and adapts through a selectionist process. New pro- 
cesses may “grow” by the random combination of primitive operations and are pruned 

13 For general introductions to these areas, see Ballard & Brown (1982) and Fischler & Firchein (1987). 
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when they fail to yield useful information or information which is irrelevant in the en- 
vironment in which the agent finds himself. 

2.3.3 Making distinctions 

The data on sensory channels are values from a continuous domain (between 0.0 and 1.0). 
To be the basis of natural language communication, these values must be transformed 
into a discrete domain. This is precisely the task of the conceptual layer. It performs 
massive data reduction to make infinitely rich environments manageable. One means 
of categorisation is to divide up each domain of values on a particular sensory channel 
into regions and simply assign a category to each region, thus creating a discrimination 
tree. For example, the HPOS-channel can be cut in two halves leading to a distinction 
between [left] (0.0<(/hpos < 0.5) and [right] (0.5<gHPOS<gl.0). The triangular object 
in Figure 2.8 has the value hpos=0.35 and would therefore be categorised as [left]. Sim- 
ilarly, the vpos-channel can be divided in two halves yielding the categories [lower] 
and [upper], and likewise the GRAY-channel yielding the categories [light] and [dark]. 
Given these categories, the rectangular object in the scene of Figure 2.8 would be cate- 
gorised as [right upper light]. Of course, light, dark, left, lower, etc. are labels that 
I have given to these categories. The Talking Heads create categories by partitioning 
sensory channels but do not use these labels internally. 

It is always possible to refine a distinction by further subdividing its region. Thus an 
agent could further divide the bottom region of the HPOS-channel (categorised as [left]) 
into two subregions [totally-left] (0.0<hpos < 0.25), and [mid-left] (0.25<hpos < 
0.5). The triangular object can now be categorised as [mid-left], if simply categorising 
it as [left] is not distinctive enough. Each of these categories can still be further refined. 
The categorisation networks resulting from these consecutive binary subdivisions form 
discrimination trees (Figure 2.9). It is not at all assumed that all agents have the same 
trees; due to different developmental histories, divergence is inevitable. 




Figure 2.8: The scene contains two objects: a triangular object and a rectangular object 
with the same gray levels. Each one is characterised by values on three sen- 
sory channels: hpos, vpos, and gray. 
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Figure 2.9: A discrimination tree displays the divisions of the total range of values on a 
sensory channel into finer and finer subregions. Categories are assigned to 
each region at different levels of the tree. 

There are obviously other ways to move from the continuous domain of sensory chan- 
nels to the discrete domain of categories. 14 For example, it is not really necessary to have 
a binary division, we could just as well split each region into three or more subregions. 
Or we could introduce focal (prototypical) values and associate a category with each of 
them. In the latter case, the categorisation process consists in identifying the prototype 
that is closest to an object’s value. For the current experiment, I will stick however to a 
binary categorisation strategy because that is the simplest to understand and formally 
investigate. 

We will see that agents build hundreds, even thousands, of categories as they play 
their language games, and in addition they make combinations of categories. To bring 
some order in this profusion of categories, I will label them using the sensory channel 
from which a category operates, followed by the upper and lower bound of the region 
they carve out. Thus [totally-left] is labeled as [hpos 0.0-0.25], because it carves out 
a region between 0.0 and 0.25 on the HPOS-channel. When it must be emphasised that a 
category belongs to a particular agent, for example al, we write [hpos 0.0-0.25] a i. The 
same category in agent a2 is labeled as [hpos 0.0-0.25] a 2. If I want to talk about this 
category in the abstract, I will not mention any agent and simply write [hpos 0.0-0.25]. 
We will also allow conjunctive combinations of categories which will be written as a set, 
as in {[hpos 0.0-0.25], [vpos 0.0-0.25]}, which could be paraphrased as totally-left and 
totally-down. 

Where do an agent’s discrimination trees, and hence categories, come from? This is 
one of the main puzzles to be addressed in this book and it will occupy most of Chapter 4. 
Very briefly, I propose again a selectionist approach, as for the formation of low-level 
visual routines. I hypothesise that the nodes and branches of the discrimination trees 
grow more or less randomly in all directions. The use and success of categories is moni- 
tored and categories which are not sufficiently useful or successful in the environments 
encountered by the agent are pruned. I will argue in Chapter 4 that this mechanism 
indeed leads to a repertoire of distinctions that is adequate for playing language games, 
and that categories therefore need not be innate nor learned by induction from a large 
series of examples. 



14 See Taylor (1995). 
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2.4 Lexicalisation 

Verbalisation (mapping meaning to form) involves two distinguishable activities. The 
first one relies on a lexicon to map components of meaning to individual words. The 
second one relies on syntactic rules to provide supra-word structuring and additional 
syntactic marking to express additional aspects of meaning, particularly how component 
meanings are combined into a complex whole. Both types of activities also take place 
in interpretation (mapping form to meaning): the individual words are mapped back to 
their meanings and the meaning of the whole is reconstructed from the meaning of the 
parts. 

In the early origins of language, there must have been an initial phase in which no 
complex syntax was in place yet. Utterances then must have consisted of single words 
or multiple words without further syntax. Such syntax-less languages have been called 
proto-languages. 15 

Children acquire their first words around the first year of life. 16 Most people believe 
that they do this as a result of hearing a particular word repeated several times in a 
certain context and gradually abstracting an association between a word and a meaning. 
But how is it that they know which meaning to associate with a particular word? How 
is it that word acquisition goes so rapid? We will follow a different approach, which 
leaves an open question whether this applies to human word acquisition as well. The 
approach will be selectionist. The agents construct hypotheses, either on the basis of 
one specific case where they guess through a non-verbal strategy what the meaning of 
an unknown word might be, or they have simply invented a new word because they 
do not have one yet. The hypotheses are then tried out in subsequent games and either 
receive confirmation or are falsified. As a side effect of this local behaviour, a global 
self-organising dynamic process arises leading gradually to coherence. 

Here are a few example games to give a general flavor of this selectionist approach to 
word acquisition. Let us assume that there are only two agents, al and a2, and they use 
only the three sensory channels introduced earlier: vpos for vertical position, hpos for 
horizontal position, and gray for grayscale. 

2.4.1 Same meaning, same referent 

Here is the simplest possible instance of a language game, based on the scene in Fig- 
ure 2.8. The speaker, al, has picked the triangle as the topic. To give a verbal hint, he 
needs to conceptualise this topic, which means in this specific case, to find a category 

15 See Bickerton (1999) as well as Thomason & Kaufman (1988). Children similarly go through a single word 
phase (even though a single word for them might be multiple words for us), and then slowly start to boot- 
strap their grammar. See Tomasello (1992) and Bates, Bretherton & Snyder (1988). They are still observed 
in the very first phases of child language acquisition or in pidgins that spontaneously form when two 
communities with widely diverging languages need to interact. Out of proto -languages, languages with a 
fully-fledged syntax must have emerged at some point. How this occurred is still a heavily debated mystery. 
In this volume I only treat the origins of proto-languages. 

16 Representative work in the study of child language learning focuses mostly on the acquisition of specific 
meanings. See Gleitman & Landau (1994), Clark (1993), Bowerman (1996). 
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(or set of categories) which distinguishes the topic from the other objects in the context. 
Here the context contains only one additional object, namely the rectangle. The category 
[vpos 0.0-0.5] a i (lower) fits the criteria, [vpos 0.0-0.5] is valid when vpos< 0.5 which 
is the case for the triangle but not for the rectangle, al has an association in his lexicon 
relating [vpos 0.0-0.5] a i with the word lu, al retrieves this word and transmits it to the 
hearer, which is agent a2. 

a2 has stored in his lexicon an association between lu and [vpos 0.0-0.5] a 2, and so 
hypothesises that [vpos 0.0-0.5] a 2 must be the meaning of lu. When this category is 
applied to the present scene, in other words, when a2 filters out the objects whose value 
for vpos do not fall in the region [0.0-0. 5], only one remaining object is obtained, the 
triangle. Hence a2 concludes that this must be the topic and points to it. 

The speaker recognises that the hearer has pointed to the right object and so the game 
succeeds. The complete dialogue is reported by the commentator as follows: 

Game 125. 

al is the speaker. a2 is the hearer, 
al segments the context into 2 objects 
al categorises the topic as [VPOS 0.0-0. 5] 

al says: "lu" 

a2 interprets "lu" as [VPOS 0.0-0. 5] 

a2 points to the topic 

al says: "OK" 

This game illustrates a situation where the speaker and the hearer associate the same 
meaning with the same form and where the meaning picks out the same referent for both 
agents. No wonder the game succeeds. Unfortunately things are seldom that simple. 

2.4.2 A new word 

There are (at least) two ways in which a game similar to game 125, but played earlier 
might have failed. First of all it can be that the speaker does not have a word yet for the 
meaning he wants to convey. The speaker then invokes a strategy to repair this failure. 
The simplest strategy is to create a new word for the present meaning. This is how al 
might have created the word lu, and associated it with [vpos 0.0-0.5] in his lexicon. Such 
simple constructive steps cause new words to enter the lexicon. 

Second, it can be that the hearer does not know the word. The game then fails and 
the speaker points to the topic so that the hearer can make an educated guess what 
the meaning might have been: If the hearer is able to recover a possible topic from the 
non-verbal hint given by the speaker, he himself can seek a distinctive category or set 
of categories that delineates this topic from the other objects in the context just as the 
speaker has done. It is possible (although not necessary as we will see) that the hearer 
a2 arrives at his own version of the same category, namely [vpos 0.0-0.5] a 2. The hearer 
now stores in his lexicon a new association between the form heard, lu, and the guessed 
meaning [vpos 0.0-0.5] a 2. Based on this extended lexicon, he will succeed in the same 
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game in the future and he can use lu himself to verbalise [vpos 0.0-0.5] a 2 when talking 
to other agents. This is the way new words spread in the population. 

A game where these two repair activities have taken place is reported by the commen- 
tator as follows: 

Game 25. 

al is the speaker. a2 is the hearer. 

al segments the context into 2 objects 

al categorises the topic as [VPOS 0. 0-0.5] 

al creates a new word: "lu" 

a2 does not know "lu" 

a2 says: "lu?" 

al points to the topic 

a2 categorises the topic as [VPOS 0. 0-0.5] 

a2 stores "lu" as [VPOS 0.0-0. 5] 

2.4.3 Competition between words 

The observant reader will have noticed immediately that I have swept some important 
difficulties under the carpet. First, it could very well be that unknown to the speaker 
another agent had already used the word lu for another meaning, so that there are now 
two alternative meanings for lu in the lexicon, lu has become ambiguous. Second, it may 
be the hearer a2 already had a word for [vpos 0.0-0.5] a 2, for example bomida, so that 
there are now two synonyms for [vpos 0.0-0. 5] in the lexicon. Synonymy and ambiguity 
are very common in natural languages and unavoidable when a group of distributed 
autonomous agents creates a language without a central co-ordinator. This implies that 
the agents’ lexicon must be sophisticated enough to support multiple associations. An 
agent must be able to store the different meanings being used for the same word, and 
the different words being used for the same meaning. 

Will this process not lead to a proliferation of words and massive, inefficient lexicons, 
particularly in large populations? No. As I will discuss extensively in Chapter 5, a pos- 
itive feedback loop between use and success can be set up causing progressive conver- 
gence towards an efficient lexicon. The agents keep track of the use and success of each 
form-meaning pair and prefer the forms that have had the most success in past use. The 
more success a form has, the more it will be chosen and consequently the more success 
it will have in the future. This positive feedback loop creates a winner-take-all situa- 
tion because as soon as one form is slightly preferred, its success grows and overtakes 
its competitors to eventually dominate (Figure 2.10). Particularly in open environments, 
the dominance may only be temporary after which a new struggle develops and another 
word becomes the winner. 
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Figure 2.10: The graph shows the competition between different forms for expressing 
one meaning in a population of 20 agents. The graph plots the frequency of 
each form in the population, more precisely the percentage of agents that 
prefer a particular word. A complex dynamic process unfolds with periods 
where one word (first xu and then fepi) dominates. 

2.4.4 Disambiguation 

Another difficulty which I did not deal with when discussing Game 25 above, is that 
the hearer may conceptualise the scene differently from the speaker. For example, the 
triangular object is not only located at the lower half of the scene and the rectangle at 
the upper half, but it is also to the left, i.e. with hpos <0.5, whereas the rectangle is to 
the right. It follows that the hearer could just as well have hypothesised that lu means 
[hpos 0.0-0.5] (left) and not [vpos 0.0-0. 5] (lower). 

Is this bad? It depends on the situations being encountered in the future. When the 
game is played again for the same scene with al saying lu to mean [vpos 0.0-0. 5], and 
a2 interpreting lu as [hpos 0.0-0. 5], the game would succeed! Communicative success 
is achieved whenever the hearer recovers the referent chosen by the speaker; it is not 
required that they use exactly the same meaning. In fact, the hearer can never know 
for sure what meaning was initially conceptualised by the speaker and neither can the 
speaker know which meaning was understood by the hearer because they cannot look 
inside each other’s head. The meaning can be quite different, as we have just seen, as 
long as it is compatible. 
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Even more remarkably, if a topic which is at the lower portion of the scene, is al- 
ways on the left side and vice-versa, there would always be success despite the different 
meanings of lu, and the players would never discover that each means something else 
by lu. 

Similar situations arise in human natural language communication as well, particu- 
larly for words which are not stabilised yet or whose interpretation depends strongly on 
the non-verbal information provided by the context. They also arise when the speaker 
or hearer have different sensory modalities or sensibilities. For example, a colour-blind 
person (unable to make the distinction between red and green) will recognise the red 
traffic light by its position. For that person, stop for the red light does not mean stop for 
the light that has the colour red but stop when the upper-most light is lit. 

These examples show clearly that shared language and meaning arise from the efforts 
of agents to co-ordinate their conceptualisations and lexicalisations with respect to the 
environments they encounter and the games they play, but that these co-ordinations 
cannot be perfect nor totally uniform because the agents have limited rationality. In 
general, we can not assume that different agents have exactly the same conceptualisa- 
tion of reality and that they mean the same thing by the same words. As we will see 
in later experiments, the Talking Heads hardly agree on the meaning of a word, par- 
ticularly in the early phases of language development, but they nevertheless manage 
to have a surprisingly high communicative success rate. There is no guarantee that a 
particular form maps onto the same meaning, even in the same language community. 
Despite these shaky foundations, communication is generally successful because there 
is sufficient coherence among the members of a community and sufficient constraints 
from the context. 




Figure 2.11: A second scene which can be used to disambiguate lu. 

Consider now another scene (Figure 2.11). The speaker is again al and categorises the 
bottom triangle as [vpos 0.0-0.5] a i, meaning the object at the lower half of the scene. 
Assuming that the hearer a2 has associated lu with [hpos 0.0-0.5] a 2, he would interpret 
lu as [hpos 0.0-0. 5] a 2 (to the left). But this does not yield a single referent and therefore 
the game fails. This failure is an opportunity for the hearer to repair his hypotheses 
about the possible meanings of lu. When he conceptualises the scene himself, he finds 
that [vpos 0.0— 0.5] a 2 distinguishes the triangle from the circle. A new association be- 
tween lu and [vpos 0.0— 0.5] a 2 is stored. The old association is not removed but enters 
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in competition with the new one, and gradually one meaning will come to dominate by 
the winner-take-all mechanism discussed earlier on. 

The commentator reports this kind of interaction as follows: 

Game 137. 

al is the speaker. a2 is the hearer, 
al segments the context into 4 objects 
al categorises the topic as [VPOS 0.0-0. 5] 

al says: "lu" 

a2 interprets "lu" as [HPOS 0.0-0. 5] 

There is more than one such object 
a2 says: "lu?" 

al points to the topic 

a2 categorises the topic as [VPOS 0.0-0. 5] 

a2 stores "lu" as [VPOS 0.0-0. 5] 

Through such disambiguating situations, meanings of words get clarified and the lex- 
icons of the agents become more similar. Note that the dominating meaning of lu can 
still go in two directions. For example, a2 could have used lu with al in a situation where 
its only possible meaning is [hpos 0.0-0. 5] (left). Then al, now playing the role of hearer, 
would have stored the association between lu and [hpos 0.0-0.5]. If this happened often 
enough [hpos 0.0-0.5] (left) might have become the dominant meaning of lu, instead of 
[vpos 0.0- 0.5] (lower). This shows clearly that the evolution of language will never be 
predictable. At a critical bifurcation, small preferential differences between the agents 
or the chance occurrence of certain situations may tilt the competition in one direction 
or the other. 17 There is no right or wrong solution in the language game and no one has 
any more rights than anyone else. 

2.4.5 Same meaning, different referent 

When agents have the same meaning for the same form, it is likely that they pick out 
the same referent from the context. When agents have a different meaning for the same 
form, it is less likely although it may still happen that they pick out the same referent, 
as we have seen. But there is an even more problematic situation: When agents have the 
same meaning for the same form but nevertheless pick out different referents! 

For example, suppose that two Talking Heads have developed the concept of [left] 
and [right] with respect to their own position in front of the scene. In terms of the 
sensory channels we have been using, anything to the left of their field of vision (0.0 < 
hpos < 0.5) is categorised as left, i.e. [hpos 0.0 0.5], and everything right (0.5 < hpos < 
1.0) is categorised as right, i.e. [hpos 0.0 0.5]. But because the agents stand next to each 
other and have therefore slightly different positions with respect to the scene in front 



17 This high sensitivity to initial conditions is one of the characteristics of a chaotic dynamic system, see 
Lorenz (1993). Indeed, as I will explore in more detail later, evolving languages have all the characteristics 
of complex adaptive systems, including the potential for punctuated equilibria. 
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of them, there is an area which will be categorised as right for one head and left for the 
other (Figure 2.12). 




Figure 2.12: Two Talking Heads are shown al and a2 each seeing the scene from a slightly 
different viewpoint. 

When this occurs in real dialogues, the form-meaning pairs lexicalising the distinc- 
tions left and right destabilise, because sometimes it gets positive feedback as it succeeds 
in the game, and at other times it gets negative feedback. This is one of the reasons why 
human categories are often relative and scaled with respect to a context. For example, 
we say the triangle to the left of the square, in other words left with respect to the square, 
to avoid the uncertainty inherent in the absolute use of left. Humans scale the size of 
objects with respect to the scene, so that small objects are small compared to the others 
in the scene and not small in an absolute sense. 



2.4.6 Situated grounded semantics 

These various examples, and I am clearly only scratching the surface, already illustrate 
the major thrust of the approach explored in the remainder of this book. There are 
dynamic processes at two levels. (1) There is the evolution of a lexicon in a single agent: 
new words are invented or adopted from another agent and the scores of form-meaning 
pairs in the lexicon go up and down depending on success or failure in the game. At 
any point in time, an agent will have a preferred form to express a particular meaning, 
but he has also stored the alternative words floating around in the population, and his 
preference will change depending on feedback in further language games. (2) There 
is also lexicon evolution at the level of the group. A coherent global lexicon emerges 
because the lexicons of the individual agents become more and more similar. This is due 
to the positive feedback loop between the outcome of using a certain form-meaning pair 
and the probability that this pair will be used again in the future. The group lexicon 
will however seldom be exactly uniform because new meanings constantly pop up and 
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agents may arrive from other language communities, bringing in new words. I will study 
this two level dynamic process in much more detail in Chapter 7. It explains many of 
the mysteries of language, for example why there is ambiguity. 

The semantic theory required to make the Talking Heads experiment work is very 
different from the classical textbook approaches to the subject, which tend to assume that 
categories can be defined in the abstract, independent of their context of use. Grounded 
processes of conceptualisation and interpretation are necessarily strongly situated and 
context-dependent. Whether ‘blue triangle’ is going to be effective in picking out the 
intended topic depends on what else is in the scene. If all other objects in the context 
are also blue triangles, the game will fail. If there is only one triangle, it was not really 
necessary to say that it is blue. This situatedness and context-dependence, together with 
the non-verbal hints given by the speaker, are crucial keys for restricting the possible 
meanings of a form and they therefore help the hearer disambiguate utterances or figure 
out the meaning of an unknown form. But situatedness and context-dependence are also 
major sources of difficulty, because the same categories may refer to different things for 
different agents in different contexts. Even if both agents agree that a particular form 
names a certain category, they may still fail in the game if the category picks out a 
different referent, for example because they see the context from a slightly different 
vantage point. 

Another major difference with classical theories of meaning and reference is that the 
repertoire of meanings and form-meaning pairs is open-ended and subject to change at 
any time. Logicians would say that the truth conditions of the forms are non-monotonic. 
Non-monotonicity is unavoidable in the case of grounded situated agents with limited 
rationality. 18 Agents should always be allowed to introduce new forms or recruit existing 
forms to express new meanings, simply because the set of meanings must expand to 
cope with novel contexts and new communicative situations. After enough interactions 
among the members of the same group in a relatively stable environment, we expect 
there to be a stable set of conventionalised form-meaning relations, but it cannot be 
expected that everything anyone ever wants to say is already conventionalised, and so 
there will always be turbulence at the fringes. The idea that the lexicon and the syntax 
of a language are static entities is completely false. Both are in constant flux. Non- 
monotonicity is consequently one of the big topics in logic-based approaches to artificial 
intelligence. 

A theory of language must try to capture the strategies by which language users shape 
and reshape their language and by which some solutions may become conventionalised 
and spread in the rest of the population, rather than focus on describing the end results 
of this process. This open-ended adaptive character of language systems is explored 
extensively in Chapter 7. 

This section probably raised many questions in the mind of the critical reader: Will 
the agents really reach a sufficiently similar lexicon to have successful communication, 



18 Examples of these processes are described in: Bybee, Perkins & Pagliuca (1994), Traugott & Heine (1991). 
For attempts to explain these “grammaticalisation” processes in terms of general cognitive operations, see 
Heine (1997). 
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despite the fact that no one has a complete overview nor controls globally what the 
others can say? Are the unclear situations, i.e. those where more than one possible 
meaning is possible, not going to destabilise existing lexicons? Will all this scale up 
both for the size of the agent set and for the size of the meaning set? Is a new, virgin 
agent going to be able to catch up with a lexicon already existing in a population? What 
happens when two populations each with their own entrenched lexicon meet? These are 
exactly the kinds of questions that I will address later. They can all be posed in a precise 
way within the context of the Talking Heads experiment. 



2.5 The origins of grammar 

The self-organisation of a shared lexicon in a population of agents already represents 
a formidable challenge. We need to find models that work in principle and we have to 
make them operational on physically instantiated autonomous robots. But many lin- 
guists would claim that this does not yet represent true language, they would want to 
see the emergence of a genuine syntax. How that might happen is one of the ultimate 
remaining scientific mysteries of our time but let me sketch a possible approach. 

Basically, I hypothesise that the agents must first of all have the capability to generate 
much more complex semantic and pragmatic strategies for conceptualising reality or 
applying a conceptualisation to retrieve a referent. 

For example, a phrase like The two red triangles to the left of the smallest green square’ 
requires the following strategy: 

1. The agent filters the set of possible objects in the scene to retain only the squares. 

2. He further filters the resulting set to retain only the green ones. 

3. He orders the remaining green squares based on size and then picks out the small- 
est one. 

4. Then he orders the objects in the scene based on their horizontal position (hpos) 
and retains only those whose position is to the left of the smallest green square. 

5. From the remaining set he filters out the triangles, and from this set those with a 
red colour. 

6. This final set should contain only two members and they constitute the referents 
of the original phrase. 

Our research has already led to mechanisms whereby agents can autonomously generate 
such semantic strategies using processes that compose primitive strategies into complex 
ones. The strategies compete for use in language games. New strategies form when 
needed by the environment or the agent’s interactions and those that do not work or 
are irrelevant get pruned. The mechanisms for generating and selecting such semantic 
strategies are not unique to language. The invention and use of tools or the planning 
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of a series of actions and the retrieval and use of ready-made plans requires exactly the 
same sorts of capabilities. 

The spontaneous generation of repertoires of complex semantic strategies already ex- 
plains some characteristics that are reflected in full-fledged languages: There is hierarchi- 
cal structure, because one strategy may call upon another strategy to achieve a subgoal, 
and a strategy may potentially call upon itself thus introducing recursivity. There is also 
a functional specialisation of categories. They are now sometimes used for filtering the 
members of a set, sometimes for ordering them, sometimes for modifying another cate- 
gory before it is used for filtering, and so on. Hierarchical structuring, recursivity and 
functional specialisation therefore do not have to originate in language. The fact that we 
see them in language structure is a reflection of the generic hierarchical and functional 
nature of semantic strategies . 19 

Second, my hypothesis is that the need for communicating more and more complex 
semantic strategies and the conceptual structures they use, has pushed language users 
to start using more and more complex form characteristics, starting with word order 
but also intonation, stress patterns, function words, morphological variations of words, 
etc. Here another aspect comes into play: the ability to recognise forms and structures, 
which we also need to recognise structured objects in scenes for example, and the ability 
to assemble a structure satisfying a set of constraints. The collective dynamics which 
are responsible for the propagation and the spontaneous self-organisation of coherence 
in lexicons should also apply here. Grammars can be seen as ecologies, where form- 
meaning pairs compete in the population. New syntactic and semantic categories, new 
constructions and new uses of grammatical conventions are continuously created, and 
existing ones may destabilise and become in disuse . 20 

Each language user employs his own ideolect which is as well as possible co-ordinated 
with that of other language users but there is never complete similarity and never abso- 
lute stability. This explains perhaps why linguists have such a hard time to pin down 
“the” language of a community. As soon as someone speaks, language changes. 

According to this theory, natural language structure is not the consequence of an au- 
tonomous innate language device which evolved by a genetic mutation or series of mu- 
tations. Instead cognitive mechanisms and structures already in place have been mar- 
shalled to get a language system off the ground, even though once language became 
essential for the rapid incorporation of new individuals and the general organisation of 
activities in human populations, it must have started to recruit vast amounts of brain ca- 
pacity and stimulated other cognitive faculties, such as categorisation, episodic memory 
or problem solving, to become in turn much more complex and versatile . 21 

19 The fact that languages change is, of course, well-known in linguistics even though it is less a focal research 
topic than used to be in the 19th century. See McMahon (1994) for a brief introduction into the subject. 
Language change has many of the characteristics of biological change but it takes place in cultural as 
opposed to biological time. See Labov (2000). 

20 The semantic strategies are similar to the cognitive processes that have been studied extensively in cogni- 
tive grammar (Langacker 1987). Formal versions of some of these processes have been formulated as part 
of Montague grammar (Montague 1974). In computational linguistics research, various attempts have been 
made to formalise conceptualisation and meaning application strategies (Gazdar & Mellish 1989). 

21 There has been an ongoing nature versus nurture debate with respect to the origins and acquisition of 
language. The different issues in this debate are already well illustrated in Piattelli-Palmarini (1980). 
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Before the problem of syntax can be tackled seriously, we must have a solid foundation 
showing how agents are capable of autonomously acquiring individual words or com- 
binations of words without syntax. This is the subject of the remainder of the present 
volume. 



2.6 Conclusions 

The Talking Heads experiment examines with what kinds of mechanisms physically em- 
bodied autonomous agents might be capable of bootstrapping an ontology, a lexicon, 
and a syntax. The Talking Heads are situated embodied agents. This allows them to 
build up and co-ordinate language and meaning in tight interaction with a shared physi- 
cal environment as perceived through their cameras. The Talking Heads are social agents, 
members of a language community. The collective dynamics of the community is a cru- 
cial ingredient to understanding how successful communications of a similar complexity 
as human natural language communications might have originated and how the under- 
lying system can be maintained from one generation to the next. 

This chapter has described the main hardware and software components of the agents 
and briefly sketched their internal architecture. The operation and effect of this architec- 
ture have been illustrated with some example games. The remaining chapters explore 
the various aspects of the agent’s capacities in much more detail. First I will break the 
global task down into different subtasks, as suggested by the semiotic square. We will 
start by looking how agents may establish the relation between a real world object and 
a segmented image (Chapter 3). Then we study how they relate a segmented image to its 
conceptualisation (Chapter 4). Third, we study how a conceptualisation can be related 
to an utterance (Chapter 5). In Chapters 6 and 7 we couple these various components 
and study the lexical and ontological dynamic processes that result. 
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Our minds deceive us. Intuitively we feel we see very clearly and unambiguously the 
objects in our environment and are able to categorise objects or actions unequivocally in 
terms of perceptually grounded distinctions. For example, when I look out of the office 
window at the inner courtyard garden, I can clearly see the plants, trees and birds, as 
well as the walls, windows, and doors of the surrounding buildings. I can categorise their 
colours, sizes, and shapes and determine whether the leaves of the trees are moving with 
the wind. Because of this categorisation I can describe to another person what I see. 

A categorisation and conceptualisation of reality is fundamentally based on sensors 
which output signals that directly reflect, in an analog and partially unreliable way, phys- 
ical properties of the environment. For visual perception, human beings have photosen- 
sors in the eye which correlate with the amount of light, i.e. the amount of photons, 
falling on the sensor. The more photons, the stronger the sensor signal. But, a photo- 
sensor, or a matrix of such sensors as in the human retina, does not tell anything more 
than what the light intensity is at a tiny spot of the image captured by the eye. There 
is no obvious straightforward procedure that groups the spots into coherent patches. 
When you implement and try out different segmentation procedures on real world im- 
ages, you quickly find that each procedure generates a multitude of possibilities instead 
of a clear segmentation. Even if coherent segments are detected, there is the problem 
of what features of the scene are going to be used for further conceptualisation. A very 
large, open-ended set of possible feature detectors can be imagined. The quality and reli- 
ability of their output depends on the scene being processed, and is in any case strongly 
context-dependent. 

So we find that the world does not present itself neatly segmented, processed, and 
categorised. There is an enormous gap between the symbolic world of objects and cat- 
egories and the subsymbolic world of analog sensing. The brain somehow performs a 
vast amount of processing to fill this gap, without us being in the least aware of it. This 
processing takes place in parallel. Many different segmentation procedures and feature 
detectors operate concurrently on streams of consecutive images, generating hypothe- 
ses that become stronger if they are confirmed by additional evidence, or weaker if they 
do not fit into a larger picture. We are only aware of the final result and therefore have 
no intuition of what is really going on, except in rare and pathological circumstances. 
Rather than viewing perception as a straight-forward step-by-step transformation of the 
raw sensory data into a segmented picture, it is better to think of the whole process as a 
boiling soup with thousands of hypotheses taking shape, some of them floating like bub- 
bles up to the surface. Constraints and expectations flow down from the top so that the 
maximum amount of available information is used to construct the coherent segmented 
picture of reality we consciously experience. 




3 Perception 



The objective of this chapter is to examine this process in sufficient detail to move 
forward with the main topic of our investigations, namely language and meaning. My 
aim is not to delve into the full complexity of the visual system, because this would lead 
us too far astray of the main topic, but to have a sufficiently rich source of features so 
that conceptualisation and language communication can be studied. 

3.1 What sensors sense 

Sensors and actuators are the interfaces between the physical world and the internal 
world. They are dedicated hard-wired components which grow in biological systems 
strongly influenced by genetics and environmental inputs . 1 

A sensor transduces external physical states into internal states. For example, a touch 
sensor, of which there are hundreds of thousands all over the human body, transduces 
mechanical energy into neural signals. Other sensors respond to the intensity of certain 
sectors of the wave spectrum. Ears respond to audible waves. Photoreceptors in the eye 
respond to visible light waves. The body also has a large number of biochemical sensors 
responsive to internal chemical states so that we can feel the hunger in our stomach 
for example. In addition to sensors, bodies carry actuators, which transduce internal 
states into mechanical energy and thus make it possible to perform actions in the world. 
Actuators receive continuous signal streams and modulate their activity based on these 
signals. 

3.1.1 Artificial sensors and actuators 

Analogs of biological sensors can be artificially recreated to give robots sensory capa- 
bilities. Something similar to a touch sensor can be created using a contact switch that 
passes current when closed. A microphone has a functionality similar to the ear. A 
digital camera can be used to recreate the functionality of an eye; it records the light in- 
tensity of millions of small rectangles of the image, known as pixels, just like the retina. 
The set of all pixels is usually called an image map. Digital colour cameras provide three 
information elements for every pixel: the amount of red, green and blue of each pixel 
(RGB). We can build sonar sensors responding to low frequency waves, just like the ones 
bats use for navigation, or infrared sensors which are useful for obstacle avoidance be- 
cause the amount of infrared light emitted and captured back by the sensor correlates 
with distance to objects. 

A sensor should never be seen as measuring accurately an abstract property of real- 
ity. For example, an infrared sensor does not measure the distance to an obstacle. The 
infrared reflection depends not only on distance but also on the reflection properties of 
the object and on the amount of background infrared in the environment, thus distance 
must be inferred and projected onto reality. Similarly, the colour receptors do not really 

1 Possible biological implementations of the various sensors and sensory processes discussed in this chapter 
can be found in Churchland & Sejnowski (1992). The nature and neurophysiological implementation of 
visual processes are discussed in detail in Zeki (1993). 
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measure colour. They respond to reflections within segments of the wavelength spec- 
trum but this reflection is again determined by many factors: how much and what kind 
of light falls on the object, and how the object reflects the light depending which varies 
with the position of the object. So colour is again a more abstract notion that needs to 
be reconstructed and projected on the image based on complex processing. 

The Talking Heads use vision as the main source of sensory information because most 
of the perceptually grounded concepts used in language derive from visual sensing. The 
only actuator (apart from the speech synthesiser which I will not discuss in detail) is a 
pan/tilt motor for turning the head up and down or left and right. 

3.1.2 Natural sensing 

It is technically possible to build all sorts of sensors and actuators that have little to 
do with human capabilities. But clearly, human-like categories and languages that are 
similar to human categories and languages will only emerge if the sensors and actuators 
are at least to some degree similar to those of humans. This is not always technologically 
possible and when it is, it often requires non-trivial transformations of artificial sensor 
data. The Talking Heads use digital cameras as their main sensory source. These cameras 
give output in RGB (red, green, blue intensities) because this is the standard in current 
computer technology. But human vision operates on four quite different channels: the 
two chromatic opponent channels which provide a value on the yellow-blue and red- 
green dimensions, an achromatic channel which holds the brightness or light intensity, 
and a saturation channel which reflects the degree of saturation or purity of a colour. 
These channels can be reconstructed from RGB by a complex transformation. 2 

3.1.3 Behaviours 

Although language requires that the image is segmented and categorised, a lot can al- 
ready be done with raw or lightly processed sensory data by dynamically coupling them 
to signals controlling the actuators. The neural abilities of many simple animals proba- 
bly do not get much further than this and it already has turned out to be a good way to 
construct the basic behavioural layers of robots that have to operate autonomously in a 
real-world environment in real time. 

For example, suppose you want a robot of the sort seen in Figure 1.3 to move towards 
a light source. As originally suggested by the cybernetician Braitenberg (1984), we can 
do this by mounting two simple light sensors to the left and right of the body and by 
implementing a direct dynamical coupling between the output of the light sensors and 
the motors. The coupling is such that when the sensed left light is stronger than the 
sensed right light, the motor attached to the left wheel is slightly decreased and the 



2 There is a vast literature on colour perception from different points of view (physics, psychology, neuro- 
physiology, linguistics). A representative sample is contained in Byrne & Hilbert (1997). In some of the 
experiments described later, the RGB values are converted to the CIE XYZ colour coding and from that 
to the opponent channels. These algorithms have been implemented by Michael Pollitis. See: Kaiser & 
Boynton (1996). 
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right motor slightly increased, so that the robot veers to the left. When the right light 
is stronger, the right motor is increased and the left motor decreased, so that the robot 
veers to the right. When these two dynamical couplings are put in place together with 
a forward movement, a zig-zag behaviour emerges which brings the robot to the light 
source. 

Basic behaviours can be put together to form more complex behaviours. 3 Figure 3.1 
shows a recording of the internal states of the robot in Figure 1.3 as it is performing pho- 
totaxis towards a light source, using two simple light sensors, and touch-based obstacle 
avoidance behaviour. When the robot strikes the box housing the light, it moves back in 
a kind of reflex behaviour triggered by the touch sensors mounted at the front. The Left- 
Light and RightLight sensory channels and the LeftMotorSpeed and RightMotorSpeed 
are shown in Figure 3.1. When the robot strikes the obstacle housing the light, its left 
and right motor is pulled down to a negative value so that it moves backwards. Then the 
robot zig-zags to the light source until it hits the obstacle again. 




Figure 3.1: Internal states of a robot’s sensory and actuator channels on the y-axis and 
time in periods of 1/40 seconds on the x-axis. The robot pushes against a box 
holding a light source. 

These channel recordings illustrate clearly that sensor or actuator data is continuous 
in time and rapidly fluctuates in response to changes in the environment or the behaviour 



3 A representative sample of experiments in this direction is reported in: Steels & Brooks (1995). See also: 
Arkin (1998). 
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of the agents. Coupling sensory data to actuators is effective for quick reaction without 
the need for higher level processing. If an obstacle is rapidly approaching, it is important 
to get out of the way rather than trying to figure out what kind of obstacle it is. The 
observed behaviour is very complex, even though the underlying dynamical systems 
are relatively simple; the complexity is due to the complexity of the real world with 
which the robot is interacting. 

Similar behaviour systems and networks have been experimented with for more com- 
plex tasks and it is actually the way that the Mars rover discussed in Chapter 1 works. 4 
However, the gap between the continuous dynamics of sensori-motor intelligence and 
cognition remains unbridged. It is possible for us to see structure in the data but this 
structure is not perceived nor used for control by the robot itself. The robot does not seg- 
ment nor categorise reality. It does not “know” that it is moving left or right and there- 
fore cannot communicate this information to another robot. All processing remains at 
the analog continuous level. However because it remains at this subsymbolic level, it is 
doubtful whether we can ever hope to bootstrap cognitive intelligence simply by adding 
more of these dynamical systems. 5 The difference between behavioural intelligence and 
cognitive intelligence resides in an additional layer of processing which is no longer con- 
tinuous and analog but discrete and symbolic. How this second symbolic layer could be 
formed but at the same time remain grounded in the analog sensori-motor layer is one 
of the main research topics addressed in this book. 

3.2 Segmentation 

The first step that bridges the gap between sensory layers and cognitive processing is 
segmentation. Segmentation means that a sensory data stream is divided into units either 
in space or in time. In the Talking Heads experiment, the environment is restricted 
to static images only, so that segmentation amounts to aggregating pixels into spatial 
patches. 

A patch may have an irregular shape which makes it hard to apply some classifica- 
tions without complex processing. Bounding boxes are much easier to compute and are 
already very useful. The bounding box of a segment is a rectangle around the contours 
of a segment (Figure 3.2). 

It is generally not necessary for segmentation to always yield parts of the images that 
correspond to what we would call objects. This is an impossible demand. What counts 
as an object is to some extent task-dependent. Segmentation yields information that can 

4 At the risk of simplifying, we can say that the early work in cybernetics (such as that of Braitenberg (1984)) 
has focused on this subsymbolic layer and that Artificial Intelligence, as a research field emerging in the late 
1950 ’s, has focused on the symbolic layer. Some researchers involved in a bottom up approach to Artificial 
Intelligence strongly refocused on the subsymbolic layer. One of the most vocal advocates of this position 
is Brooks (1991). Obviously we need the two, but bridging the gap is a non-trivial problem, sometimes 
known as the grounding problem. See: Harnad (1990). 

5 Marr (1982) remains a classical reference outlining the features that can be extracted, the principled algo- 
rithms for doing it, and possible neurophysiological models. These are the main research topics that still 
dominate research in vision. See Ullman (1996) for an overview. 
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Figure 3.2: Example of a scene captured by the camera in Figure 1.6, containing a puppet, 
a wooden house, and a horse. Segmentation is based on aggregating grayscale 
patches, i.e. areas in the image that are lighter or darker than the general 
background. Bounding boxes have been drawn around the segments. Note 
that there can be bounding boxes within bounding boxes if an object forms 
part of a larger object. 



be used for object identification but should not be equated with it. This is well illustrated 
in Figure 3.2. Segmentation has been done here by filtering out segments with lower 
or higher average grayscale values compared to the average in the scene. The segments 
that have been obtained are not necessarily those that we humans identify, because we 
use knowledge of the objects involved. This shows clearly that higher level constraints 
play a role even at the very first basic visual processing layers. 

3.2.1 Feature extraction 

Many methods for segmentation have been reported in the computer vision literature 
and all of them are useful even though they give slightly different results. Most methods 
assume that additional low level processing is first performed on the image map to detect 
small-scale structures, such as: 

• Edges, which are possible boundaries between two surfaces. Edges can be aggre- 
gated in line segments. 

• Junctions, which are regions where lines come together. 

• Patches, which are regions where the colour or the grayscale values are relatively 
constant. 

• Textures, which are small-scale regular surface markings, for example blobs. 

• Optical flow, which are vectors indicating the direction and velocity of moving 
brightness patterns. 
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• Distance from the observer, computed by matching two image maps from binocu- 
lar vision. 

• Shadings, recovered from continuous variations in brightness. 

The recovery of such features is in itself a non-trivial topic of research and a huge liter- 
ature as well as many software libraries now exist. 

The visual layer of the Talking Heads only extracts patches and edges which each 
leads to one way of segmenting the scene. Segmenting based on patches attempts to 
aggregate those parts of the image into patches (a process called region growing) that 
have more or less the same colour. More or less the same is of course a relative notion 
and larger or smaller segments will be found depending on the thresholds that are used 
for deciding whether a colour is similar or not. Segments that are too small are not 
considered further. Region growing starts by comparing for each pixel how similar it 
is to neighbouring pixels. Similar pixels are grouped as a patch. In the next step, each 
patch is examined to see whether it can be merged with a neighbouring patch, and so on 
recursively until patches cannot be combined further. Small patches or individual pixels 
that stand on their own are included as part of a larger patch so that we get sufficiently 
broad patches. 

Segmentation based on edges starts by first detecting the edges themselves, which are 
colour discontinuities suggesting a boundary between two surfaces. The edges are then 
aggregated into lines, and these lines are connected to find the contours of an object. This 
method works well for the simple objects used in the Talking Heads experiment. Things 
are no longer so simple when contours are less clear, for example because they fuse with 
the background. Further complications arise when one object partly obstructs another 
object or when a set of lines can ambiguously be organised in different configurations (as 
in visual illusions like the Necker cube). This illustrates that one segmentation method 
must be balanced with others to offset unclear areas or local ambiguities. 




Figure 3.3: Left view shows an image as captured by a Talking Heads camera. Middle 
view shows the result of segmentation based on edge detection. Right view 
shows the integration of segmentation by color and by edge detection. 
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The results of applying these two segmentation methods can be seen in Figure 3.3. The 
top picture shows the image map itself as it is captured by a Talking Head camera. The 
middle image shows the result of segmenting based on edge detection. The contours of 
two objects have been found. Note that these contours are not straight lines as one might 
expect. They are obtained by connecting together line segments which are themselves 
based on connected edges. The bottom image shows the combination of edge detection 
and segmentation based on patches. Because the green objects stand out clearly against 
the white background they are easily recognised by the combination of these segmenta- 
tion methods. Given the simplicity of the Talking Heads environment (geometric shapes 
on a white board), segmentation based on colour and on edge detection generally yields 
a segmentation that corresponds to the individual objects humans perceive in a scene. 

3.2.2 Divergent perception 

There is no guarantee that two Talking Heads looking towards the same area of the 
white board perceive exactly the same image. In fact, the contrary is true. Because the 
robots are physically grounded and situated in a particular context, standing roughly 
one meter apart from each other (as shown in Figure 1.1), they cannot see the same 
scene from exactly the same vantage point, and so images diverge. On the edges of the 
field of view, the differences might become so significant that different objects are seen 
and consequently different categories used. 

Compare for example the recorded images for a speaker (top) and a hearer (bottom) 
as shown in Figure 3.4 (to the left). The same figure shows the segmentation performed 
by both agents in a separate window (to the right). The images are clearly different 
because they have been taken from slightly different camera positions and the hearer 
only approximately perceives in which direction the speaker is pointing. Agent al (top 
of 3.4) has recovered the two circles, but not the rectangle which was deemed to small 
to be relevant. Agent a2 (bottom of 3.4) has recovered the rectangle in the left top corner 
but only one circle. The yellow circle was not recognised by a2 due to slightly different 
reflections perceived from a2’s angle of view, so that the yellow surface was no longer 
distinguishable from the white background. 

Usually the situation is not so divergent, and even if there is divergent perception, the 
categorisation used by the speaker may still be compatible with the same topic for the 
hearer. Nevertheless, we must take into account that divergent perception of the scene 
might considerably confuse language communication and the feedback the speaker gives 
to the hearer. This is one example where it is useful to do grounded experiments because 
it shows clearly a major issue (namely perceptual and hence categorical divergence) 
which is usually swept under the carpet. 

3.2.3 The sieve architecture 

Segmentation exemplifies a dual kind of processing that we will encounter again and 
again as we focus on the different layers of the cognitive architecture (Figure 3.5). Var- 
ious possible solutions are generated, expanded and combined in parallel. The possible 
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Figure 3.4: Top (left), image captured by the hearer. Bottom (left), image captured by the 
speaker. Because both have a different view on the scene, the images diverge 
and consequently the segmentation (bottom and top right) diverges as well, 
in particular the yellow circle is not perceived by the speaker. In contrast the 
speaker perceives a rectangle (bottom left corner) and has chosen this as the 
topic. The rectangle is not perceived by the hearer, so the game has no chance 
to be successful. 



solutions enter into competition until globally coherent solutions emerge, which are 
ranked and handed to the next layer of processing. Often a solution does not emerge or 
multiple solutions are equally valid in which case constraints from expectations or from 
the further processing of partial solutions must flow down to influence earlier process- 
ing, which can be implemented as a re-entry of some solutions back into the previous 
layer. For example, hearing a word may stimulate the expectation that a certain category 
is relevant, which in turn may stimulate the computation of certain features and influ- 
ence how the image is segmented. A speaker may have to choose between alternative 
segmentations and categorisations depending on whether the conceptualisation can be 
succinctly and accurately lexicalised and possible lexicalisations will only be acceptable 
when they can be integrated in a complete sentence. 

The two way flow of constraints (from perception to high level cognitive processing 
and from high level cognitive processing to perception) suggests that the brain is best 
thought of as a massively parallel, densely connected processing system, in which mul- 
tiple partial solution structures float up or down, gathering strength or weakening with 
new information entering the system. This contrasts with the view that information is 
processed in a serial step-by-step fashion through tightly compartmentalised modules. 6 



6 See Fodor (1983). This book discusses a modular, sequential information processing architecture for cog- 
nition. A non-modular, parallel view is sketched in: Minsky (1986). A more realistic neurophysiological 
model similar to the one underlying the sieve architecture we have used is discussed by Edelman (1987). 
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Figure 3.5: The different layers of cognitive processing act like a sieve. Inputs flow into 
the layer, where they are processed to generate hypotheses of which the best 
ones are transmitted to the next layer. Every layer can operate in both direc- 
tions, so that constraints can flow from top to bottom and from bottom to top. 



It is true that we have the illusion that there is some homunculus, a little man, which 
sees a single coherent picture of reality and finds quickly the right way to verbalise or 
interpret this picture, but closer examination of the actual informational requirements of 
each subprocess shows that this can never work. Constraints must flow in all directions, 
because the sensory data has not enough information to allow a unique segmentation, 
and categorisation and language is too full of ambiguities to allow a straightforward 
linear interpretation. 



3.3 Sensory channels 

Once segments have been found, further characteristics must be extracted. The values 
of these various characteristics will be communicated on sensory channels to later cate- 
gorisation processes. A characteristic property of a segment, such as average gray-scale, 
is still in the analog continuous domain and should not be confused with a category (like 
dark or light) which is in the discrete symbolic domain. An open-ended set of possible 
sensory channels can be imagined, ranging from very general channels sensitive to often 
recurring properties relevant in common tasks and thus shared by most people, to very 
specific channels which only experts in specific domains possess. 

3.3.1 Example channels 

The segment characteristics which will be used later in various experiments are defined 
below. Their values are all derived by straightforward computations from the raw image 
maps captured by the cameras. 
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• area: The surface area of a segment is calculated by simply counting the number 
of pixels that are part of the segment. 

• hpos, vpos: The x and y values of the central-position of a segment. The central 
position is calculated by taking the mid-point of the sides of the bounding box. 

• height: The height of the bounding box. 

• width: The width of the bounding box. 

• bb-area: The area of the bounding box, calculated by multiplying height by width. 

• gray: The average gray-scale value of the pixels in a segment. 

• r, G, b: The average r (redness), G (greenness), and b (blueness) values in a segment. 
To obtain more human-like colour channels, they are transformed in terms of yb 
(yellow-blue), rg (red-green), bw (black-white), sat (saturation) and brightness 
channels. 

• edge-count: The number of edges in a segment, for example 3 in the case of a 
triangle. 

• angle-count: The number of angles, determined on the basis of the junctions. 

• ratio: The ratio between the area of the segment and the area of its bounding box, 
which gives an idea how close a figure is to a rectangular shape. 

Each of these segment characteristics or combinations of characteristics enables cer- 
tain types of categorisations. For example, the gray channel makes it possible to dis- 
tinguish between light and dark, height between short and tall, hpos between left and 
right, vpos between top and bottom, area between big and small, etc. In the next chap- 
ter, we will study how such categorisations may form on the basis of their respective 
channels. 

3.3.2 Conceptual spaces 

The sensory channels used in the experiments have deliberately been kept to a minimum 
so that we can follow in detail the ontological and lexical dynamics, which is a non-trivial 
matter as I will demonstrate in Chapters 6 and 7. Obviously to get a richer lexicon, more 
channels need to be made available to the agents. These channels can often be grouped to 
form a conceptual space . 7 One of the best known examples is the colour space formed 
by the yellow-blue, red-green, black-white, saturation, and brightness channels. 



7 See Gardenfors (2000). More cognitive oriented spaces, further removed from perception, are discussed in 
Fauconnier (1994). 
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Here are some other examples: 

1. A set of sensory channels that are sensitive to characteristics of moving segments 
can easily be imagined. They include the speed of movement, the direction of 
movement (along the horizontal position (hpos) and vertical position (vpos) di- 
mensions), or the change in area (getting bigger if the object approaches or smaller 
if it moves away). This is the foundation for categories about spatial change: mov- 
ing left versus moving right, approaching versus retracting, speeding up versus 
slowing down. 8 

2. Various spatial relations between segments can be computed: inclusion and over- 
lap between segments, distance between midpoints, hierarchical structure, etc. 
This leads to another batch of categories which are the basis for spatial distinc- 
tions like inside versus outside. 

3. More properties of the shape of segments can be computed. I have already intro- 
duced the ratio channel, which is the area of the segment divided by the area of 
the bounding box, giving an indication how rectangular a segment is. This can be 
generalised by using an ellipse around a shape so that we can calculate convex- 
ity. We can then also calculate elongation (by calculating the principal axes of the 
ellipse). 

It is not at all necessary that the channels operate on visual input, they can also use 
actuator sensors or internal states like the level of the battery. It is moreover possible 
to consider channels for dynamic states, for example by transforming the sensor and 
actuator data into state-space representations and analysing them in terms of attractors. 9 

The ontological and lexical apparatus of the Talking Heads is generic with respect to 
the nature of the sensory channels that are used, they could just as easily be about other 
sensory domains like sound, tactile sensation or internal states of the robots. 

3.3.3 Perceptual constancy 

Real world sensory data is remarkably volatile due to the high variation and constant 
change of our physical environment, nevertheless human beings have the illusion of 
constancy. For example, the colour of an object is determined by the wavelength of 
the light reflected by its surface. For a long time, it was thought that colour was an 
intrinsic property of a surface, and that we therefore experience the colour of an object 
as constant, but psycho-experimental evidence has shown that this is not true. When 
we look at a surface in isolation which reflects light between 430 and 470 nanometres, 

8 Experiments using segmentation based on movement using the dedicated hardware shown in Figure 1.5 
were carried out by Tony Belpaeme. More details can be found in: Belpaeme, Steels & van Looveren 
(1998) These channels have been used in language experiments where the agents were perceiving and 
communicating about moving images: Steels (1998) The segmentation used in this book based on output 
from the Sony EVI-D31 camera was implemented by Danny Van Tieghem. Angus McIntyre integrated the 
interfaces to this camera within the babel environment. 

9 Several examples of this type of approach are found in Port & van Gelder (1995). 
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we experience the colour blue. As a result, one might assume that blue can be equated 
with the experience of light in this particular wavelength region. However, if the same 
surface is perceived in a broader context and the light conditions are changed (so that 
objectively they now reflect light in a different spectral region), they still appear as blue! 
What should objectively appear green according to the measured wavelength values is 
still experienced as blue. This explains why we see colours as relatively constant even 
if the light conditions are changing (for example from broad daylight to evening light) 
which is obviously extremely useful for dealing with rich environments. But it implies 
that colour is not an intrinsic property that can be measured objectively with a light 
meter. It is actively mapped onto reality as a result of complex processing which takes 
the context into account. 10 

The same sort of context-sensitivity is important for the other sensory dimensions that 
the Talking Heads use, as well as for segmentation. An object will appear large or small 
depending on the other objects in the scene. It will appear light or dark depending on 
the average brightness. A scene may be segmented in one way or another depending on 
the objects it contains. Visual illusions arise when more than one context is consistent 
with an image and interpretations sometimes switch back and forth between different 
possibilities. 

3.3.4 Transformations 

Cognitive agents can stabilise the sensory data to achieve more perceptual constancy 
by transforming them so that the data become less context-sensitive. This implies that 
additional processing first recovers more information about the context. One of the best 
known examples of this concerns colour constancy. As mentioned earlier, wavelength 
reflection is strongly influenced by the background light in the environment. However, 
when the average surface reflectance is known, it is possible to transform the colour data 
to recover the colour that is experienced by humans as constant for a surface. 11 

3.3.5 Scaling 

The second way in which the erratic nature of real world signals can be diminished is by 
scaling. Scaling means that the values actually recorded by sensors or feature extractors 
are calibrated with respect to a frame of reference. 

A first frame of reference is based on the absolute minima and maxima of the values 
on a sensory channel. This can be used to do sensor-oriented scaling. For example, 
the image map captured by the camera contains 320 x 240 pixels, which means that the 
horizontal position (hpos) has a value between 0 and 320 and the vertical position (vpos) 
has a value between 0 and 240. Both values can be scaled to fall between 0.0 and 1.0 so 
that they become uniform with respect to other sensory channels. Given a value x and a 



10 See the discussion about colour in: Varela, Thomson & Rosch (1991) 

11 See the discussion in: Zeki (1993), particularly Chapter 23. 
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min and max value, then the scaled value is x' = (x — min) /(max — min). Thus the 
sensed value of hpos=200 becomes hpos=0.62 after scaling . 

In the experiments discussed later, sensor-scaling is always performed so that all sen- 
sory data is between 0.0 and 1.0, allowing values on different channels to be compared 
with each other. In addition, context-oriented scaling is performed. Context-oriented 
scaling uses the minima and maxima of the values that effectively appear in the context. 
For example, if the values for the height of segments observed in the scene is between 
500 and 700, then these can be the minimum and maximum of the height scale, so that 
500 maps onto 0.0 and 700 to 1.0. Context-oriented scaling acts like a lens magnifying 
the perceived values so that distinctions stand out more clearly. This scaling could be 
made more sophisticated by introducing an additional scaling factor so that differences 
are amplified but not necessarily blown up to their extremes. 

Context-oriented scaling has two advantages. The context is now strongly taken into 
account, in a way similar to human perception: A segment looks dark when surrounded 
by light segments but light when surrounded by darker ones. The second advantage is 
that differences within the relevant subrange that actually occur in the scene are ampli- 
fied, so that they stand out even if the range of possible values is much wider. Often both 
relative (context-oriented) and absolute (sensor-oriented) scaling are pertinent. Thus left 
can be the left-most of all the objects in the scene (context-oriented scaling) or left in 
absolute terms (sensor-oriented scaling). Some channels, such as the colour channels, 
should never be scaled for context, because the categorisation works best on the basis of 
the sensor-scaled values. 

Context-oriented scaling is not necessarily based on the frame of reference imposed 
by the image map which is recorded by perceiving the scene from the viewpoint of 
the camera. Many contexts and hence other frames of reference are possible and are 
exploited in natural language conversations. Often a particular context is communicated 
through language. Scaling is then performed within the frame of reference suggested by 
that context, for example, if I will tell someone ‘the chair is to your left’, whereas it may 
perhaps be to the right of the hearer from my point of view. 

A final form of scaling uses the typical values of the object being perceived. I will call 
this object-oriented scaling. For example, a small elephant is always very large next 
to a large mouse. We clearly have expectations about the typical size of an elephant at a 
certain distance and use it to scale perceptual data prior to size categorisation. Context- 
oriented and object-oriented scaling are examples how constraints must flow from non- 
visual cognitive processes to visual processing. These non-modular interactions are a 
strong indication that cognitive subsystems must be highly interconnected. 

3.3.6 Saliency 

The perceptual layer generates in parallel a wealth of sensory characteristics about each 
of the segments in an image and their relationships. But not all sensory characteristics 
are equally distinctive. For example, two segments in a scene may have almost the same 
area but one might be very thin and thus tall, and the other very wide and thus short. In 
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our own human perception, those channels that reflect significant differences stand out 
and are preferred in referring to an object. Therefore, it is unlikely that area would be 
used in Figure 3.6 to distinguish object 1 from the others. 
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Figure 3.6: A scene with three objects. The sensory values on the grayscale channel are 
the most salient and will therefore be chosen preferentially for categorisation 
and verbalisation. 

The saliency of a channel is the smallest distance (after sensor-scaling) between the 
perceived values of the topic and one of the corresponding perceived values of the other 
segments in the context. Thus the different perceived values (after sensor-scaling) for the 
segments in Figure 3.6 are shown in Table 3.1. The last line shows the saliency, assuming 
that the topic is segment 1. Clearly the GRAY-channel is the most salient one in this case, 
followed by the wiDTH-channel. Other sensory channels such as hpos, vpos or height 
(which have almost the same values for object 0 and 1) are not salient at all. 

Determining saliency has to be done before context scaling, because context-scaling 
stretches the values to their extremes and so saliency information is lost. Table 3.2 shows 
for the scene shown in Figure 3.6 the area channel with its raw data, the value after 
sensor-scaling (with minimum 1,000 and maximum 10,000), and the value after context- 
scaling. 

Another example based on the segmented image is shown in Figure 3.7 (top images). 
Two circular segments have been identified, the others are ignored because they are 
too small. The different sensory values (after sensor-scaling) for the segments in the 
speaker’s image are shown in Table 3.3. 

The table shows clearly that hpos is the most salient channel. The horizontal position 
also strikes us immediately as being the most salient when looking at Figure 3.7. For 
many of the other channels, the differences are almost insignificant. The use of saliency 
facilitates enormously communication and the acquisition of new categories. The case 
shown in Figure 3.7 (top images) is an excellent opportunity for the agents to learn about 
left versus right. If more channels are salient, it is no longer so easy for the hearer to 
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I) 






Figure 3.7: Three examples of segmented images. The topic is indicated by a dashed 
bounding box in the image of the speaker. Segments which are too small 
are ignored. The topics have all been conceptualised as being to the right and 
so the same word gofubo has been used to refer to them. 
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Figure 3.8: Processing converts data on a sensory channel into more usable data for later 
categorisation processes. 



guess what meaning might have been used by the speaker and so incoherence would 
slip into the group’s lexical system. 

The various steps that the agents go through in preparing data on sensory channels 
are summarised in Figure 3.8. It is presented here as flowing in one direction, but in fact 
constraints coming from higher level cognitive processing may influence these steps. 
For example, if we say The largest square left of the triangle’, we expect the hearer to 
scale the squares with respect to all the objects left of the triangle, not with respect to 
all the objects in the scene. The backward flow of constraints will be studied after I have 
covered the different layers separately. 



3.4 Methodology 

I hope the reader now has a much better idea of the enormous challenge that the Talking 
Heads face when trying to play a language game about a real world scene, particularly 
because they try to develop a lexicon and ontology as well. The images contain a multi- 
tude of ways to make distinctions and they differ slightly for both agents. It is enough 
fo a cloud to pass by causing the light conditions to change slightly, and different values 
are immediately seen on the colour channels possibly leading to different segmentations. 
So how can the agents ever get a repertoire of abstract categories and associated words 
when the real world shows such a perplexing variation? Very different scenes (for ex- 
ample the ones contained in Figure 3.7) can intuitively be distinguished with the same 



Table 3.1: Perceived values for each segment in Figure 3.6. 



obj 


HPOS 


VPOS 


HEIGHT 


WIDTH 


GRAY 


AREA 


0 


0.66 


0.95 


0.01 


0.71 


0.19 


0.27 


1 


0.69 


0.83 


0.07 


0.33 


0.97 


0.21 


2 


0.99 


0.87 


0.54 


0.72 


0.22 


0.57 


sal 


0.03 


0.05 


0.07 


0.39 


0.75 


0.06 
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Table 3.2: Data for the area channel for the scene in Figure 3.6. 



Object 


Raw data 


Sensor-scaled 


Context-scaled 


0 


24530 


0.27 


0.18 


1 


18924 


0.21 


0.0 


2 


50960 


0.57 


1.00 



Table 3.3: Perceived values for each segment in Figure 3.7 



channel 


obj-0 


obj-l 


saliency 


HPOS 


0.27 


0.16 


0.11 


VPOS 


0.20 


0.20 


0.0 


HEIGHT 


0.15 


0.15 


0.0 


WIDTH 


0.10 


0.11 


0.01 


AREA 


0.10 


0.10 


0.0 


R 


0.23 


0.25 


0.02 


G 


0.32 


0.34 


0.02 


B 


0.63 


0.65 


0.02 



categories (namely [left] versus [right]). But how can such inductive leaps be made? 
Scaling and sensor transformation introduce some perceptual constancy, and saliency 
helps to restrict the attention to those channels that are potentially effective in commu- 
nication, but there is clearly an enormous gap between sensory data and language. 

In this book, I will not only report on the outcome of an experiment and what we 
learned about the nature of cognitive architectures, but I will also try to illustrate how 
we tackle such tremendously complicated problems. I will take a moment to explain this 
methodology because it runs like a red thread through the remaining chapters of this 
book and differs from the way other subdisciplines of cognitive science go about their 
investigations. 

3.4.1 Putting up scaffolds 

The standard means of attacking a difficult problem is to break it up in subproblems. In 
this case, the most obvious subdivision is along the different sides of the semiotic square 
I introduced in the previous chapter (Figure 3.9). This leads to three natural subproblems: 
perception, conceptualisation, and verbalisation. When breaking up a problem, we can 
initially assume that the other subproblems will be solved somehow and that they give 
perfect output to the others or provide the right feedback. We can then try to invent 
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a mechanism that can do the job for the subtask we are investigating in these ideal 
circumstances, and test its strength and limitations. This is like putting up scaffolds to 
see whether partial solutions work, before putting it all together. 



perception 

image < image 



conceptualisation 



▼ verbalisation 1 
meaning > utterance 



Figure 3.9: The general problem of production is broken up into three subproblems: per- 
ception, conceptualisation and verbalisation. 

I will extensively make use of this methodology. This chapter has focused on percep- 
tion, the next chapter focuses exclusively on conceptualisation, assuming that there is 
good segmentation and a decent set of sensory channels. Chapter 5 turns to the prob- 
lem of lexicalisation, assuming that the agents have a shared repertoire of meanings and 
agree on what meaning to use in a particular game. Chapter 6 then puts the solutions for 
conceptualisation and lexicalisation together by coupling their inputs and outputs and 
establishing the appropriate feedback connections. Chapter 7 then couples this system 
to the perceptual processes discussed in this chapter, so that we get back to our original 
goal: understand how perceptually grounded language communication is possible. 

The methodology of breaking a problem into its subproblems goes quite a distance, but 
is not without danger. The processes relating perception to language cannot be modular 
for reasons already mentioned. Each layer receives inputs which are not completely 
reliable and generates a set of possible suggestions rather than a single “correct” solution. 
Constraints have to flow from top to bottom because there is simply not enough reliable 
information to solve the problem with a straightforward sequential decision process. In 
addition, every layer is constantly adapting itself to the surrounding information context. 
New categories may develop, new words need to be learned, new sensory channels may 
even emerge. So, constructing a global system is going to be more than simply putting 
together its parts. There are complex behaviours that will only become visible when the 
appropriate non-modular couplings are put in place, and this is notoriously difficult to 
do and study. 

3.4.2 Idealisation and realism 

To make this problem more manageable I will adopt a second complementary method, 
widely used in engineering. We can keep the various aspects of the global process intact, 
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but simplify the challenge. We start with extremely idealised operational circumstances 
and then gradually add more and more realism, until the system is ready to face the real 
world. At every step we first establish whether the mechanisms still work, which means 
in the guessing game that communicative success moves up. If this does not happen, we 
must investigate what needs to be added or changed and perhaps reduce the complexity 
again to find a new solid ground. If a language system does get established, we can 
examine the limits of the process before increasing the challenge once more. During 
the original research, we extensively used this stepwise approach, often sliding down 
the mountain when too much complexity was introduced too quickly so that we were 
forced to take a few steps back and tackier simpler challenges until we got our feet on 
the ground again. 

We can simplify or scale down in four dimensions. The first dimension is that of the 
agents. I will often start by investigating a group of only two agents, then scale up to 
larger and larger groups, and then tackle the problem of an open-ended population in 
which new members enter and others leave. Each of these steps introduces additional 
difficulties. For example, when there are only two agents, the risk of synonyms forming 
is low - as we will see later - because the agents only interact with each other and so they 
can rapidly see whether the other one has a word for the same meaning. But when the 
population is scaled up it is highly likely that different subgroups invent different words 
or develop new meanings. These variations take time to propagate until the group settles 
into a coherent state. When there is an open-ended set of agents, the new agents in the 
population must acquire a language which already exists, which implies that we must 
have demonstrated that language acquisition goes sufficiently fast to explain cultural 
transmission. When agents leave the community, they take some of the knowledge of 
the language conventions, and so we have to show that the whole community might 
destabilise. 

Second there is the dimension of the real world and how it relates to perception and 
action. For this book, I will in any case only use worlds restricted to coloured shapes 
on a two-dimensional surface, thus drastically reducing the perceptual challenges, and 
constrain the perceptual task still further by supplying the agents only with a limited set 
of sensory channels. We also performed initially many simulations with artificial worlds 
(to be explained shortly) where we could carefully control various parameters, such as 
the number of objects in a scene, their complexity, or their variation. 

But restricting the environment is not enough. Other aspects of perception can se- 
riously disrupt language communication in a variety of ways and each of them can be 
neutralised. We have already seen that in normal circumstances, the agents do not share 
the same image of reality, which introduces a whole array of problems. They might 
consequently segment the scene differently, the pointing might not be accurate enough 
(even if they both were referring to the same object), the segments might have very 
different sensory characteristics (as already discussed for Figure 3.7). We have reduced 
these sources of difficulty by initially using only one camera, then scaling up to two 
cameras in the same room, and only then scaling up to many different cameras located 
all over the world. 
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Increasing or decreasing the importance of saliency also helps. When only the most 
salient sensory channel is used, agents have a higher chance to guess the right meaning 
(at least if their perception converges reasonably well), and so they can better guess 
the meaning of an unknown word or go less astray with words for which they have 
already a shared meaning. So by varying the saliency, we can control the degree of 
semantic ambiguity in the agents’ communications. Divergent perception and confusing 
regularities in the environment are sources of polysemy in language, which the agents 
need to dampen if they want their communications to be efficient and reliable. 

Another real world aspect that we can make more or less complex is related to the 
movements of the head and the pointing. A hearer must be able to look in the same 
direction as the speaker, so that there is a minimum shared context. The hearer must be 
able to point to the topic guessed, so that the speaker can see whether the communication 
succeeded. If it did not, the speaker must be able to point to the topic. These physical 
interactions are well within the state of the art in robotics, and there exist even various 
learning systems capable of bootstrapping this capacity from scratch . 12 

But these processes will never be completely reliable either. So we can increase or 
decrease the challenge to the agents by making the non-verbal communication and co- 
ordination more or less challenging. In the experiments reported in this book, speaker 
and hearer can communicate to each other the direction in which they are looking . 13 Be- 
cause they still see a different image due to their physical position, the interaction is still 
partly unreliable but it is sufficiently stable to enable the agents to bootstrap a shared 
communication system, which then in turn can help to establish physical coordination. 

Third, there is the cognitive apparatus of the agents implicated in language. Here we 
can start with a simple associative memory for the lexicon that can only handle single 
words associated with single meanings, and then scale up to conjunctions of meanings 
covered by single or multiple words, and then still further to open-ended complex mean- 
ings and syntax. Each step requires additional machinery in the agents, which will au- 
tomatically lead to more complex linguistic forms, but it is our experience that there is 
great value in trying to understand the basic process of word meaning acquisition before 
attempting to install more cognitive complexity in the agents’ architecture. Even for the 
acquisition and propagation of single word utterances there are still many open-ended 
problems. 

A final dimension concerns the transmission and perception of the utterance. Here 
again we can scale up or down the challenge to the agents, from full-blown uncon- 
strained speech in noisy environments down to perfect direct transmission of the lan- 
guage form by the speaker and perfect recognition by the hearer. Complexity along this 
dimension has been reduced to the minimum in the Talking Heads experiment. Agents 
transmit utterances directly although a speech sound is generated so that human listen- 
ers can hear which utterance is produced as part of a game. 



12 This is the problem of hand-eye coordination performed in living systems by the vestibulo-oculomotor 
systems. See: Anastacio & Robinson (1989). 

13 This real world interaction has been developed by Kaplan (1999). 
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Figure 3.10: An example of the computer-generated scenes from the geom world. Each 
shape is labeled for further reference. 



3.5 The geom world 

One way to perform controlled experiments quickly and on a large scale consists of arti- 
ficially generating the perceptual input to the agents. This has the additional advantage 
that we can simplify the situations the agents have to work in (for example have fewer 
types of shapes) and let them start initially with a shared perception. A simulation world 
that we have used extensively for many simulations reported later is the geom world. 

The geom world generates geometric shapes similar to the physical figures we paste on 
the white board. Possible geometric shapes are circles, triangles, squares, rectangles, etc. 
We can control the complexity of the scenes that are generated through a few parameters, 
for example the number of minimum or maximum figures, or the possible repertoire of 
shapes. We ignore colour and use only different grayscale values. To construct a scene 
the computer simulation program chooses first a random number of figures. Then for 
each figure, a shape is chosen, and random values for the main characteristics (position, 
height, width, grayscale) are set. 

An example of a computer-generated scene containing only rectangles is shown in 
Figure 3.10 (another one was shown earlier in Figure 3.6). Given such a scene, each 
agent calculates the bounding box and the segment-characteristics for every computer- 
generated figure. For the scene in Figure 4.3, which contains three squares and a rect- 
angle, the values of the channels are summarised in Table 3.4. After sensory-scaling we 
get Table 3.5. 

The output of the simulation is exactly the same as the one from real vision so that 
we can easily switch between simulation and physical experimentation. 

Working with simulations has obvious advantages for speeding up development and 
systematic testing, but it does not replace experimentation with physical robots. It is true 
that building an experimental apparatus such as we built for the Talking Heads exper- 
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Table 3.4: Sensory data for the scene shown in Figure 3.10. 





HPOS 


VPOS 


HEIGHT 


WIDTH 


GRAY 


RATIO 


AREA 


0 


116 


166 


293 


293 


0.777 


1.0 


85849 


1 


692 


317 


148 


224 


0.449 


1.0 


33152 


2 


192 


64 


137 


137 


0.408 


1.0 


18769 


3 


167 


770 


277 


277 


0.201 


1.0 


76729 



Table 3.5: Sensory data after scaling for the 


scene shown in 


Figure 3.10 




HPOS 


VPOS 


HEIGHT WIDTH 


GRAY 


RATIO 


AREA 


0 


0.09 


0.13 


0.64 


0.64 


0.78 


1.0 


0.95 


1 


0.55 


0.25 


0.16 


0.41 


0.45 


1.0 


0.37 


2 


0.15 


0.05 


0.12 


0.12 


0.41 


1.0 


0.21 


3 


0.13 


0.62 


0.59 


0.59 


0.20 


1.0 


0.85 



iment is a very non-trivial engineering project, particularly because of the teleporting 
infrastructure that enables the agents to travel to multiple sites and thus experience dif- 
ferent physical environments or the same environment from different points of view. 
One might therefore wonder why we have persisted to go to such great length in con- 
structing a real-world physical infrastructure. Why are experiments with robotic agents 
necessary or desirable? Is it not enough to conduct simulation experiments given that 
they can be done so much more easily? 

Computer simulations calculate the consequences of a theoretical model. For example, 
we can implement Newton’s model of the solar system and simulate the movements of 
the planets around the sun by calculating for small time steps the position of each planet 
and hence the trajectories they follow. All the aspects of a calculation are under the 
scientist’s control and it is therefore possible to use this method to examine whether a 
theory can be operationalised, whether it is coherent, and whether it is complete, i.e. 
whether it covers all aspects of the phenomena one tries to understand. Simulations can 
be inspected, re-executed or reprogrammed by anyone who cares to challenge them and 
other researchers can try to achieve the same performance with alternative approaches, 
so that different theories can be compared in an objective way. 

Computer simulations can be set up for any theory which is formalisable, and hence 
for theories of cognition and language as well. We need to implement the cognitive 
architecture of the agent and then examine what happens when the agent engages in 
interactions, i.e. when data is supplied from real or simulated scenes. Computer simula- 
tions of cognitive mechanisms provide proof that they can be instantiated on physical 
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systems, even though it may still be a mystery how the brain, itself a physical system, 
implements similar mechanisms. Computational simulation is of course not restricted 
to the theories I put forward in this book. Any theory claiming to explain the origins 
of language and meaning should be testable by computer simulation, so that it is clear 
what form the architecture takes and whether it does the job. All this is a big step com- 
pared to the early days of cognitive modelling when one was supposed to believe on 
faith whether a certain cognitive architecture could be instantiated by a physical system 
and whether it was indeed able to exhibit the functionality that was ascribed to it. 

But computer simulations have two major shortcomings, which makes them only one 
of the tools in the toolkit of the cognitive scientist. First of all, computer simulations 
by themselves do not empirically validate a theory. When a computer screen shows 
pictures of planets moving around the sun, there is nothing that says that this is also the 
way the planets move. To validate the theory, large amounts of data need to be collected 
of the natural phenomena, and the simulation results have to be compared with the 
real world data to determine a sufficient fit. In this book, we are primarily concerned 
with formulating new plausible models for cognition and particularly for the origins of 
cognition, not yet with their empirical validation; this exciting work is left for the future. 

However, there is a second shortcoming. Cognitive systems must deal with the real 
physical world in its infinite variety and complexity. If we perform only computer sim- 
ulations, we not only model the cognitive mechanisms of the agents but also the en- 
vironments which they are confronted with. We validate the model only with respect 
to stylised worlds. Of course we can make such worlds much more sophisticated, but 
there is never a guarantee that they are going to be representative of the real world that 
embodied cognitive agents have to deal with. The more realistic computer simulations 
become, that is the more aspects of reality they reliably take into account, the more 
the simulation starts to approach reality and the more a simulation may tell us whether 
the proposed mechanisms live up to realistic circumstances. But there will always be a 
huge difference between computer-simulated worlds and real worlds, as any roboticist 
knows all too well. And this is why we need to do experiments with physical robots and 
real-world environments. 

Of course, it is useful and efficient to conduct simulations as part of the discovery 
process but true validation of a cognitive architecture can only come when the system is 
confronted with a real physical environment. It forces us to attack the final known or hid- 
den assumptions in the theory’s operational validation. It forces us to address the issue 
of sensing, using real world physical sensors. Therefore, an experiment like the Talking 
Heads experiment has a much greater force in convincing the sceptic. It is like testing 
the design for an airplane by building and flying one, as opposed to demonstrating the 
idea only in simulation. 



3.6 Conclusions 

This chapter focused on the perceptual layer which is responsible for interfacing the ex- 
ternal physical world with the internal world of the agents. The interface is based on 
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sensors to transduce external states into internal states and actuators to transduce inter- 
nal states into external states. Four sorts of operations take place on raw sensory data 
making them more amenable to categorisation, conceptualisation and consequently lan- 
guage communication: Features in the form of micro-structures are extracted, the image 
is segmented in coherent units using region growing or countour finding, segment char- 
acteristics such as average colour or size are derived, and characteristics are transformed 
or scaled to bring out salient features and aid to achieve categorial constancy. 

The Talking Heads use standard techniques from computer vision research. Research 
in this field is indeed so advanced that a large number of algorithms for segmenting 
images and extracting information about image segments could be taken off the shelf 
and re-programmed to get a vision system that is adequate for the task setting of the 
Talking Heads experiment. This will, of course, no longer be the case if the environment 
is made more challenging and more open to novel situations. Very quickly we would then 
reach the limits of what can currently be done using standard techniques. Nevertheless 
for our purpose of exploring language and meaning, the Talking Heads vision system 
gives a sufficiently rich and reliable segmentation and characterisation of the scene so 
that an investigation of how real world scenes might be conceptualised and verbalised, 
has become possible. 
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The previous chapter laid the perceptual groundwork for investigating conceptualisation 
and language communication. It proposed a set of mechanisms for segmenting a scene 
and extracting characteristics about each segment. The next task of the agent is to use 
these results from the perceptual layer to categorise and hence conceptualise the scene. 
For the time being, I will ignore the full complexity of natural language meaning and 
focus only on the most simple type of conceptualisation one can imagine, namely cate- 
gories, which logicians refer to as unary predicates, as well as conjunctive combinations 
of categories. Words like blue, light or square name such categories. We not only need 
a way that the Talking Heads can use such categories to conceptualise a scene based on 
visual perception, but we also need mechanisms that can explain how such categories 
might develop or be acquired by each agent autonomously and without supervised train- 
ing. This is clearly an enormous challenge and no known universally accepted solution 
exists to this problem. 

Categorisation has fascinated philosophers and scientists since the onset of thought, 
not only because it is one of the most fundamental capabilities of the human mind, but 
also because the subject matter immediately raises some intriguing paradoxes and puz- 
zles. First of all, we have already seen that there is an enormous gap between the sym- 
bolic world of objects and categories and the subsymbolic world of analog sensori-motor 
signal streams. A particular sensory signal is highly context-dependent and inherently 
noisy due to the partial unreliability of sensors and their limited accuracy. Sensory pro- 
cessing, transformation, and scaling go some way to achieve perceptual constancy but 
cognitive processes must clearly make up for the fleeting erratic nature of reality. Here 
we are immediately confronted with a first paradox. Univocal categorisation seems only 
possible when an interpretation of reality is within reach, for example when there are 
strong expectations, possibly coming from language utterances. But this interpretation 
depends itself on categorisation. How can this apparent causal circularity be broken? 

Furthermore, if it is already so difficult to map categories on real world sensory data 
streams, how on earth can categories form and become stable? Young children appear 
to acquire perceptually grounded categories effortlessly and without systematic training. 
Psychologists have often observed the acquisition of distinctions with very few clear ex- 
ample cases and no overt feedback. On the other hand, categories appear to some extent 
culture-specific and different individuals make more refined distinctions depending on 
the sort of tasks they engage in. This is the case even in such a basic domain as colour 
categorisation. Painters or textile designers make distinctions ordinary humans do not 
see and have developed a very extensive repertoire of terms to talk about these distinc- 
tions. These observations highlight a second paradox: The effortless early acquisition 




4 The Discrimination Game 



of perceptual distinctions suggests that categories are innate. But the dependence on 
culture, individual variation and specialisation suggest that categories are learned. How 
can these two observations be reconciled? 

Then there is a third paradox, first suggested by Jean-Jacques Rousseau. In order to 
communicate, the speaker and the hearer need to share the building blocks of the con- 
ceptualisations underlying their communication. If I say the wine glass on the table, I 
expect the hearer to be able to recognise what objects are tables, when something is 
on the table or not on the table, and when something is a wine glass versus another 
kind of glass. But if every individual learns categories independently and autonomously, 
how can they ever become shared? It is assumed that language helps in establishing a 
shared ontology within a language community but language itself depends on shared 
categories, so how can the whole system ever get off the ground; how can this chicken 
and egg situation be broken? 

I will begin this chapter by outlining the empiricist and rationalist positions, which 
have dominated much of the philosophical discussion on categorisation and have in- 
directly influenced attempts to build artificial systems able to perform some form of 
categorisation. I will then propose an alternative selectionist approach and study a 
categorisation system based on it. It is shown that agents endowed with this system are 
able to develop an adequate repertoire of categories for distinguishing objects in their 
environment and that this repertoire remains adaptive when important changes occur in 
the environment. This goes some way to resolve the paradoxes of meaning, but the story 
remains incomplete. To explain how agents can share the same categories even if they 
develop their ontologies independently of each other, I will later argue for a co-evolution 
of language and meaning. I will show that when ontological development is coupled to 
lexical development, the two become co-ordinated with neither a central co-ordinator 
nor prior design. 



4.1 The paradoxes of meaning 

There are basically two philosophical doctrines that have tried to address the paradoxes 
of meaning. One doctrine is known as empiricism, the other one as rationalism. Many 
philosophical texts are available introducing these philosophical doctrines and their his- 
torical roots. The problem of the origins of language and meaning was for example 
already a highly debated topic among philosophers in the 18th century, see for example 
Rousseau (1781). 

The debate between rationalism and empiricism is still very much alive today and now 
based on much more knowledge about what it means for something to be innate or what 
the limits are of induction. See Elman et al. (1996) for the most recent arguments and 
counterarguments. Compared to the full richness of human experience, I necessarily 
have had to adopt a very narrow view, focusing only on basic perceptually grounded 
categories. More complex categories related to human relationships, emotions, social or- 
ganisation, or beliefs will not be considered and it would be very difficult to do so with 
the methodology used here. See Varela, Thomson & Rosch (1991) for a broader discussion. 



76 




4.1 The paradoxes of meaning 



4.1.1 The empiricist’s stance 

The empiricist tradition has a long and reputable history. The first clear formulations 
emerged as a counterreaction to 17th century rationalism, with the work of Hume, Locke, 
and others. Empiricists were inspired by the early success of the natural sciences, which 
had insisted on observing reality as it is, through sensory experience and stepwise in- 
duction. The empiricist attitude has continued to dominate epistemology in the 19th 
and 20th century. It was formulated by Bertrand Russell in a doctrine called logical 
empiricism, and elaborated by generations of philosophers from Carnap to Quine into 
rich logical frameworks and precise inductive methods. Empiricist explanations about 
categorisation today very much dominate the neurosciences. 

Empiricists argue that categories capture what is common or invariant between cases 
and that neural networks in the brain can detect this invariance. They also argue that 
these commonalities can be learned by progressively abstracting away from the details of 
specific cases, even if there is a poor stimulus. A child sees many examples of red objects 
and progressively grasps the abstract concept [red] by retaining what is common to all 
of them. If categories are the result of a general inductive learning procedure, the areas of 
the brain responsible for categorisation do not have to be specialised or pre-programmed 
for recognising specific categories. Empiricists therefore believe that they can take the 
form of general purpose networks that learn any kind of category by making abstraction 
from the examples supplied by the environment. Categories are therefore not innate but 
learned. 

In recent decades, various designs for neural network models have been proposed 
that reflect this empiricist stance. 1 The input nodes of these networks receive data from 
sensory channels of the sort discussed in the previous chapter. They are connected to 
higher level nodes which use the data to decide whether a category applies. The connec- 
tions are weighted and a positive output signal is produced when the weighted sum of 
the inputs exceeds a certain threshold. Thus the networks exhibit some of the flexibil- 
ity, tolerance to variation, and context-dependence seen in human categorisation. The 
weights and thresholds are learned by propagating back errors in categorisation. If a 
node makes a positive identification when it should not have done so, the weights of the 
incoming connections are lowered and the threshold is increased, so that it is less likely 
that the threshold will be exceeded again for the same situation the next time around. 
Conversely, if the network makes a negative identification where it should have made a 
positive one, the weights of the incoming connections are increased and the threshold 
lowered. It is known through mathematical proofs that such networks indeed stabilise 
on reliable categorisations, if the environment remains sufficiently constant. 2 These in- 

1 The first neural network models emerged in the fifties from the work of neurologists and computer sci- 
entists like Donald Hebb, McCullogh and Pitts, Rosenblatt, and others. There was a strong first wave of 
enthusiasm in the sixties, as illustrated for example in: Minsky & Papert (1969). A second wave developed 
in the mid-eighties when new more powerful network architectures were discovered that could handle in- 
termediary representations and later on temporal structures (see the overview in Churchland & Sejnowski 
(1992). 

2 This type of neural networks and some of its main variants are discussed at length in the classical textbook 
by Rumelhart & McClelland (1986). 
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ductive neural networks therefore constitute their first serious proposal for bridging the 
gap between sensory signals and categories and for explaining the origins and acquisi- 
tion of categories. 

4.1.2 The rationalist’s stance 

A radically different approach to categorisation has been proposed by rationalists. The 
rationalist point of view had its first clear formulation in Plato’s philosophy. A resur- 
gence of rationalist ideas took place in the 17th and beginning of the 18th century, par- 
ticularly through the work of Descartes and Leibniz. More recently, in the second half 
of the 20th century, a strong rationalist movement emerged again, mainly under the in- 
fluence of Noam Chomsky. Today rationalist attitudes very much dominate linguistics 
and cognitive psychology. 

Early rationalists, like Descartes, were dualists, which saw the mind and the body in 
totally different realms. In such a view, it becomes very difficult to scientifically inves- 
tigate perceptually grounded categorisation, and an explanation how the brain works 
seems more remote than ever. However, most contemporary rationalists (like empiri- 
cists) now believe that categorisation is done by physical structures in the brain. Indeed, 
if certain parts of the brain are damaged, the capability to categorise disappears or is 
severely restricted and distorted (Deacon 1997). 

Rationalists claim that categories exist a priori and therefore categorisation comes 
from within. They argue that humans have a repertoire of ideal universal forms, which 
they project onto reality. Reality itself is a weak, imperfect reflection of these forms, like 
the shadows of objects on the wall of a cave. Because of this poverty of stimuli, categories 
(particularly the perceptually grounded categories that are the focus of our attention in 
this book) are claimed to be unlearnable by induction and must therefore be innate. 3 

This innateness hypothesis suggests that the brain comes with categorisation or- 
gans, small neuronal circuits capable of performing the mapping of some idealised uni- 
versal form onto reality. Consequently the human genome must include a set of concept 
genes which regulate how each of these categorisation organs should grow during de- 
velopment. Rationalists claim that it is absurd to think of categories as being learned 
from example cases supplied by the environment, just as it is absurd to say that the 
hand learns to grow five fingers. 

4.1.3 Arguments for and against rationalism 

There is something to say both for a rationalist and for an empiricist approach, indeed 
otherwise so many serious thinkers could not have believed fervently in one position 

3 Strong forms of innateness have been defended by Fodor (1983) and many philosophers and linguists as- 
sociated with the generative grammar paradigm. See the discussion in Wierzbicka (1992), particularly the 
introductory chapter. In artificial intelligence research, particularly the logic-oriented tradition, there is 
often an implicit acceptance that basic categories are innate, but that derived categories, formulatable in 
terms of more primitive concepts, can be learned (see McCarthy 2008). There are also large-scale efforts 
going on to build an ontology as rich as human ontologies, see the discussions around Doug Lenat’s CYC 
project in Steels & McDermott (1994). 
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or the other. Rationalists point to the fact that children acquire concepts surprisingly 
quickly and apparently with very little stimuli and that anthropological observations 
have shown that there are strong universal tendencies for basic perceptually grounded 
categories, such as colour or space. However, more detailed observations show that the 
acquisition of categories in children goes in fact very slowly. For example, concepts like 
cause-effect and the correct use of the word because, the proper use of tenses, etc. all 
take years to develop. Adults keep acquiring new categories through out their lifetime, 
which makes it difficult to maintain that they are part of the human genome. For example, 
airline pilots and sailors categorise the direction and strength of winds and the shapes 
and colours of clouds, so as to predict turbulences, future weather, or advantageous 
trajectories. There are occasionally profound differences in how cultures conceptualise 
reality. These differences do not seem to be innate because everybody who has had 
sufficient exposure to the culture, preferably at an early age and therefore without too 
much preconception, can normally acquire them. 4 

Further objections against the innateness hypothesis have come from the camp of 
neurobiology. In lower animals, neural circuits have been identified which perform very 
specific categorisations of reality. For example, the frog is sensitive to objects of a par- 
ticular size moving in front of it at a particular speed, specifically the kind of objects 
that constitute a potential source of food for the frog. The dedicated neuronal circuits 
performing this categorisation have been shown to be innate and shared by all frogs. 
But higher animals and humans exhibit an enormous plasticity, both in terms of the 
repertoire of categories they recognise and in terms of the actual brain structure. 5 

The difference between an animal reacting to a limited set of environmental stimuli 
with a rigid neural apparatus and a cognitively endowed human being is precisely the 
high degree of flexibility and adaptivity of the latter. It is therefore not surprising that 
clear-cut categorisation organs which have the same structure in all humans and are 
located at the same position in the brain have not been found. The micro structure of the 
brain does not consist of neatly separated organs and it therefore does not make sense 
to look for genes that regulate their maturation. Neurobiologists tell us that the brain 
appears more like an organically grown tissue rather than a delicately tuned machine 
laid out by genetic programs. In contrast to insect brains or brains of lower animals, 
the mammalian brain is capable of regenerating itself to some extent after damage, and 
brain tissue from one higher order animal can be implanted in another one, causing 
a resurgence of lost function, even if the source location of the transplant is different. 
For example, if tissue from the visual cortex is implanted in the auditory cortex, it will 
regenerate and function as part of the auditory cortex. Pathways to the visual cortex 
can be redirected to the auditory cortex, which will cause the auditory cortex to take on 
functions of visual processing. 

4 Rationalists often argue that there is no other way to explain the child’s rapid acquisition of concepts but 
detailed psycholinguistic observations have shown that child language understanding (which is the clearest 
sign whether certain concepts have been acquired) is often deceptive because non-verbal strategies may 
lead to appropriate answers to adult questions and thus the appearance of understanding. 

5 Evidence for the remarkable plasticity of the human brain is reviewed in Edelman (1987) and Elman et al. 
(1996). 
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Even supposing that there is a strong genetic determination of micro-level brain struc- 
ture, there is still the question how the hundreds of thousands of concepts employed by 
adult human beings might have become included in the human genome. Saying that a 
perceptually grounded category is innate does not explain anything. One has to show 
a plausible evolutionary history for the categories hypothesised to be innate and prove 
that the hypothesised concept genes can propagate sufficiently fast in the human popu- 
lation . 6 

4.1.4 Arguments for and against empiricism 

Empiricists have had considerable success in coming up with inductive learning mech- 
anisms. They have even been demonstrated on autonomous robots in direct interaction 
with the environment. At the same time, the learning mechanisms proposed so far turn 
out to be very fragile . 7 The human experimenter has to carefully set up the architecture 
of the network (the number of nodes, the number of layers of nodes, and how they are 
connected), tune the learning parameters, and supply just the right set of test cases. Per- 
formance may degrade when too many cases are seen. Even worse, when new cases 
are supplied that require a revision of categories already learned, a substantial portion 
of the earlier cases must be resubmitted to retrain the network. All this contradicts the 
robustness and open-ended extensibility of human categorisation. In addition, the learn- 
ing mechanisms proposed have been slow to consistently acquire categories. A large 
number of cases are required and often cases must go through multiple iterations. More- 
over any inductive method is a slave of the data. A category will only become reliably 
recognised when it is statistically significant. 

So there are questions both for a rationalist and an empiricist approach. If categories 
are innate how can new categories, which are required when the environment or the task 
settings change, form so quickly? How can the genetic code store the vast repertoire of 
categories humans routinely employ, and how can we explain the diversity with which 
different cultures approach reality? On the other hand, if every child independently 
acquires categories by learning, we must question how can they do so, given the poor 
quality of the examples they see. How is it that learning is so rapid? How can different in- 
dependent learners arrive at a repertoire of categories that is sufficiently shared to make 
language communication possible? How do we reconcile the apparent innate origins 
of perceptually grounded categories with their remarkable adaptedness to the changing 
needs and open-ended environments that the individual effectively encounters? 



6 See Worden (1995). This article argues, based on the genetic difference between humans and other primates, 
that there are limits to genetic transmission of cognitive content, like categories or grammars. For discus- 
sions on the speed of gene spreading compared to cultural evolution, see Cavalli-Sforza, Cavalli-Sforza & 
Thorne (1996). 

7 The weakness of traditional connectionist networks with respect to speed and resilience are discussed at 
length in Quartz & Sejnowski (1997). 
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4.2 Selectionism 

The difficulties encountered with the empiricist and rationalist points of view make it 
worthwhile to explore alternative solutions. The one I propose and further explore in this 
book has been inspired by two key principles from biology. The first principle is that of 
selectionism. It requires a growth process in the agents that generates possible structures, 
even in the absence of examples, and a pruning process that removes those that are irrel- 
evant. The growth and pruning process is assumed to be biologically given but not the 
categories that result from it. The second principle that I will adopt is interactionism, put 
forward by biologists to understand how genetic influences and environmental impact 
cooperate to shape an organism. Interactionism was first suggested by Piaget (originally 
a biologist) to explain the growth of mental capacity in the child. His numerous exper- 
iments show a gradual progressive construction of increasingly more complex ways of 
categorising and conceptualising reality. The child encounters situations that can be 
assimilated, and thus cause entrenchment of existing structures, as well as situations 
that cannot be handled and require the child to accommodate with new constructions 
or reorganisations. 8 

4.2.1 Principles of selectionism 

Selectionism is a general means of explaining the origins of complexity. It requires: (1) 
a process that can generate a repertoire of possible solutions in a basically random fash- 
ion, (2) a process for preserving solutions so that there can be a gradual build up of 
more complex solutions, and (3) a selectionist force, which uses feedback from the envi- 
ronment and influences preservation so that adequate solutions are retained and others 
discarded. 9 

In the case of the Darwinian explanation for the evolution of species, a solution is 
an organism capable of surviving in a given environment. Types of organisms are pre- 
served in the genetic material as it is copied from one generation to the next. Variations 
are produced due to errors in gene copying, mutations, gene insertion, etc. The feedback 
comes from the natural environment. Organisms that do not flourish are less success- 
ful in reproduction so that their genetic material and hence the organisms this material 
generates are less likely to be preserved. Selectionism contrasts with Lamarckian in- 
structionism, which claims that the organism transmits its adaptations and what it has 
learned during its life time to its offspring. 



8 The work of Waddington is typical for the interactionist approach to development, see Waddington (1975). 
Such a view does not necessarily imply that there is a pre-determined course of development. Piaget em- 
phasised a progressive, dynamical view on development Piaget (1985). His work arose prior to the detailed 
computational modeling that is now common place in cognitive science and there is still an enormous 
work left to demonstrate operationalisations of these insights. See for a more recent discussion of the 
issues: Thelen & Smith (1994). 

9 There are many excellent introductory accounts of selectionism. One of the best known is: Dawkins (1976). 
The recent application of selectionism to the automatic derivation of computer programs clearly demon- 
strates how general the principle is. See Goldberg (1989) and Koza, Goldberg & Fogel (1996). The principle 
has now even been applied to the development of computational hardware. See Sipper, Mange & Andres 
(1998). 
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From the viewpoint of instructionism, the neck of a giraffe is so long because at some 
point giraffes with shorter necks often stretched their necks. This was transmitted to 
the offspring which got born with slightly longer necks. They also stretched their necks 
further, and so on. In a selectionist framework, it is assumed that the natural variation in 
the population is exploited. Some giraffes have longer necks than others, and if this gives 
an advantage their genes proliferate. Within the population born with the new gene dis- 
tribution there are still variations and once again those with a longer neck proliferate, 
and so on. Instructionist processes build further upon acquired characteristics. There 
is progressive learning from one generation to the next. Selectionism assumes natural 
variation and progressive dominance of fitter variants, with neither learning nor trans- 
mission of acquired characteristics. Selectionism can make sudden jumps and therefore 
has the potential to go much faster than the transmission of acquired characteristics. 

In the case of the immune system, a solution is an antibody capable of combatting in- 
truders foreign to the organism. Here again an instructionist approach can be envisioned 
and was believed for a long time to be the case. For this belief to be true would mean 
that the immune system somehow learns the appropriate response and then preserves 
that response. The selectionist viewpoint of the immune system argues instead that it 
generates autonomously a very large repertoire of possible antibody responses. When a 
foreign body invades, the response is already there, it is simply amplified (Varela et al. 
1988 ). 

4.2.2 Selectionist cognitive systems 

The Talking Heads experiment explores the same line of thinking, both to the acquisition 
of categories, and later experiments (discussed in part II) to the acquisition of more com- 
plex meaning and even grammar. It implies that there is no learning taking place in the 
empiricist sense of induction from a series of examples, but that instead three processes 
are active: 

1. a process whereby structures capable to categorise reality are generated in a basi- 
cally random fashion, 

2. a process to preserve these structures and build further upon them to enable a 
steady increase in complexity, and 

3. a selectionist force which prunes away those structures that were irrelevant and 
retains the ones that are successful and needed. 

As I will expand upon in more detail in this chapter, categorisation can be carried out 
by discrimination trees where the nodes in the tree filter objects depending on whether 
they fall within a sensory region or not. I will show that the discrimination trees grow in 
a more or less random fashion and those parts of the tree that are irrelevant get pruned. 
The selectionist feedback comes from the games in which the agent participates. Distinc- 
tions that are effective in discriminating the topic from the other objects in the context 
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and have been successfully lexicalised, are maintained in the lexicon of the community, 
others are discarded. 

When there is a high failure rate, the discrimination trees should expand, in the same 
way that the immune system gets stimulated (but does not strictly speaking learn) when 
the organism is invaded or genetic variation increases in periods of stress on a species. 
When there is a high success rate, some pruning might be possible. The growth and 
pruning dynamics creates an ecology of distinctions which is constantly adapting itself 
to the situations and tasks the agent encounters, without any innate a priori categories 
and without any inductive learning. 

4.2.3 The tree metaphor 

The growth of a tree or plant is a good metaphor to visualise this selectionist approach. 
The shape of a tree appears well adapted to its environment. Typically there are more 
branches and leaves where there is more sunlight. The height of the tree reflects the 
competition of neighbouring trees or the height of surrounding buildings. The overall 
shape reflects the shape of surrounding walls or other trees. It is obvious that a tree does 
not come with “shape genes” that determine exactly which shape the tree will have in a 
particular setting. Nor does it come with sophisticated sensors and a brain inductively 
learning about the environment so as to decide on which branch the next leaf should 
grow. Instead the tree grows in all directions following a steady, usually quite regular 
growth pattern. A tree standing alone in a landscape exhibits a beautiful balanced shape, 
expanding in all directions, but when the growth is constrained, the tree reflects these 
constraints. The branches and leaves that catch more sunlight receive more resources to 
flourish and develop further, whereas those pointing towards an area with no sunlight 
are stifled in their growth and may die altogether. 

Given that the brain is a living tissue, it is possible to imagine a similar growth process 
in the brain. 10 Neural networks implementing discrimination trees could be expanding 
in all directions, just as other tissue forms. The overall growth dynamics is genetically 
determined but neutral with respect to the repertoire needed in a certain environment. 
The shape of the discrimination trees in a particular individual is due to the kinds of 
sensory data that have been produced in his interactions with the environment and their 
use and success in other cognitive processes such as language communication. The users 
of the discrimination trees and the environment act as selectionist forces molding the 
spontaneously forming repertoires. 

Note that selectionism is not applied at the level of a species as in biological evolution 
but at the level of the cognitive structures in each individual as he or she develops and 
adapts during his or her life time. The parallel exploration and the competition of alter- 
native ways to categorise reality take place in a single individual interacting with the 
environment and therefore are very rapid. 



10 Several neurobiologists have presented suggestions and evidence in this direction. See particularly Edel- 
man (1987) and Changeux (1997). 
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4.2.4 Deriving new sensory channels 

Obviously the sensory channels are a critical part of the categorisation process. When 
the sensory data is not available, it is simply not possible to develop discrimination trees 
along a particular sensory dimension. This raises the question as to where the sensory 
channels themselves may come from. The most basic raw data is directly supplied by the 
sensors themselves and is thus innately given. But processes calculating the values on 
the more complex channels must somehow form under the influence of the environment, 
and they are thus potentially partly influenced by language. 

Here again both an inductive and a selectionist approach can be envisioned. In an 
inductive approach, a learning process, such as embodied by a connectionist network, is 
fed with a series of examples of situations exhibiting particular characteristics, and the 
network becomes sensitive to the characteristics of these situations. Several concrete 
examples of such a learning process have already been studied using the neural network 
techniques mentioned earlier. 11 

In a selectionist approach, a repertoire of primitive operations is given, presumably 
implemented by the basic biochemistry of the neural systems, and there are ways to com- 
bine these operations into more complex visual programs. Those programs yielding an 
outcome which is afterwards used successfully in categorisation are retained and the 
others discarded, thus establishing a co-evolution between a repertoire of sensory chan- 
nels and a repertoire of categories. We have already done some successful experiments 
in this direction, see De Jong & Steels (1999), but this theme will not be pursued further 
here as a full discussion would be too much a digression from the main line of investi- 
gation. For the remainder of this book, the sensory channels will be pre-programmed, 
although it is still up to the selectionist categorisation process to discover which ones 
are useful in the environments presented to the agents and which ones are not. 

4.2.5 Comparing approaches 

A selectionist approach is different from both the rationalist and empiricist points of view. 
In contrast with rationalism, it is not assumed that categories are universal and a priori 
shared. Categories are not innate. In contrast with empiricism, it is not assumed that 
categories are derived by induction from a large set of cases. Categories are not learned. 
What is claimed to be innate is a general purpose growth and pruning dynamics which 
could be realised by the biochemistry of neural cells. The growth process may generate 
many categories which may turn out to be useless, but eventually it settles on a repertoire 
which is adequate for the environment in which the individual finds itself. 

A selectionist approach to category formation has some characteristics that make it 
look like categories are innate. Categories may form in an individual without having 
ever seen one single example. They appear to pop up from nowhere, but this does not 
mean that genes determining this particular category have to be innate. They are the 
result of a random growth process which has simply generated these possibilities. Alter- 
natively, a selectionist process has some characteristics that make it look like categories 

11 Examples are discussed in: Linsker (1990). 
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have been learned. The individual ends up with a repertoire which is adapted to the 
environments and tasks that are indeed encountered, and this ontology keeps evolving 
to remain adequate when the environment changes or new tasks are encountered. Be- 
cause the selectionist approach has characteristics of innateness as well as learning, it 
is capable of helping resolve the paradoxes of the origins of categories discussed in the 
beginning of this chapter. It explains adaptivity without learning and fast development, 
even with a weak stimulus, but without innateness. 

All of this sounds intriguing and brings in a refreshingly new point of view, but does 
it really work? Can we invent the required growth and pruning dynamics and identify 
the appropriate selectionist feedback loop? The remainder of this chapter focuses on this 
question. 

4.3 Discrimination trees 

First I will introduce structures that can perform categorisation. The next section shows 
how these structures may autonomously originate in an agent interacting with the en- 
vironment. 

4.3.1 Making distinctions 

Empiricists start from the idea that categories capture what is common between objects. 
Thus [red] is supposed to capture what is common or similar to all red objects. Hence 
in theories of (formal) semantics, the meaning of a predicate is equated with the set of 
all things that belong to the class it delineates. But we can also turn things around. We 
can view a category as a way to capture what is different between objects. For example, 
the distinction between [left] and [right] is based on the horizontal position (hpos) of 
an object with respect to the viewer. The distinction is imposed on the scene (as long 
as it is compatible) instead of recognised. This may seem a subtle difference, but it has 
profound consequences, particularly for acquiring categories. 

Consider the category [large]. What do all large objects have in common? At first 
sight very little. Almost any physical object can be called large in one context or another. 
It is going to be very difficult for a learning agent to determine some commonality, even 
if given thousands of examples of large objects. In the beginning, the agent might be 
confused by commonalities in colour or position or shape. Only with some luck, will the 
learning algorithm start to zoom in on size. This is what makes inductive learning so 
slow and why it is criticised, rightfully, by rationalists. In fact, searching for common- 
ality hardly makes sense for many categories, [large] is only meaningful in opposition 
to [small], [left] only makes sense in opposition to [right]. Often categorisation is 
relative to the context (a block which is large when surrounded by smaller blocks may 
be categorised as small when surrounded by larger blocks) and occasionally it is relative 
to the objects themselves (a large mouse is always categorised as much smaller than a 
small elephant). 
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Few people would disagree that [large] and [small] are distinctions imposed on re- 
ality as opposed to intrinsic properties of classes of objects, but what about colour? Is 
it not an example where a category is absolute? We have already seen in the previous 
chapter that this is not the case. The colour reflection depends strongly on the surface 
reflection and thus on the light conditions and light sources in the environment. What 
should objectively appear blue when measuring the wavelength might actually appear 
green and vice-versa. Colour is not an intrinsic property but is actively mapped onto re- 
ality in a context-sensitive way. 12 This does not mean of course that categorisation does 
not make use of sensory data. On the contrary, without sensory data, categorisation 
would be entirely impossible. 

Once categories are available, they can be used for much more than making distinc- 
tions. For example, if an agent has the distinction between [large] and [small], he can 
use it to group the objects in a scene into two subsets: those that are large and those that 
are small. So in this case, the categories are used to group objects based on a characteris- 
tic they have in common, namely being large and being small. My main argument here 
is that categories form driven by discrimination tasks, afterwards they can be used for 
many other semantic processes, including classification. 

4.3.2 Categorisers 

I will refer to the process capable of making a distinction as a categoriser. It operates on 
the output of a sensory channel and decides whether a category or its opponent is valid. 
A categoriser keeps track of its success by maintaining an internal counter. Consider 
for example the category [large] which operates on the output of the AREA-channel. 
This channel contains a scaled value between 0.0 and 1.0 for the area of a segment. A 
distinction can therefore be made simply by dividing the set of possible values in two 
halves: those whose area is between 0.0 and 0.5 and those whose area is between 0.5 and 
1.0, giving us two categories, [small] and [large]. When the agent needs to categorise 
an object, he checks in which region an object’s area falls. If it is between 0.0 and 0.5, 
the category is [small], if it is greater than 0.5 the category is [large]. 

It is clearly possible to refine each category c by introducing categorisers that further 
divide the region of possible values of c into smaller subregions. For example, the cat- 
egory [small] which is applicable if the area is between 0.0 and 0.5, can be refined by 
introducing two more specific categorisers: one responding to a region between 0.0 and 
0.25 for [very-small] and another one for a region between 0.25 and 0.5 for [medium- 
small]. The total set of distinctions using values of the same sensory channel can be 
organised in a discrimination tree (Figure 4.1). 

As mentioned earlier, I will label categories by using the sensory channel from which 
a category operates, followed by the minimum and maximum values of the region carved 
out by the category. For example, [area 0.0-0.25] carves out the region [0.0,0.25] of the 
area channel. When it must be emphasised that a category belongs to a particular agent, 
for example al, I will write [area 0.0-0.25] a i. 



12 As mentioned earlier, see the discussion about colour in Varela, Thomson & Rosch (1991). 
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Figure 4.1: A discrimination tree contains a set of categorisers which categorise by check- 
ing whether a sensory value falls in the region of one category or not. The 
discrimination tree shown operates on values on the gray channel. 



Conjunctive combinations of categories also have dedicated categorisers which are 
linked to the categorisers of their components (see Figure 4.2). A conjunctive combina- 
tion often yields a more efficient way to pick out the topic compared to a single, possibly 
very fine-grained distinction. For example, it might be that [area 0.0-0.25] and [gray 
0.5-1. 0] together are distinctive but none of the two on their own is. Other logical com- 
binations are equally of interest but I will restrict my attention to conjunctive combina- 
tions. Conjunctive combinations of categories will be written between curly brackets 
({,}) as in {[area 0.0-0.25] [gray 0.5— 1.0]}. 




Figure 4.2: More complex categorisers (shown as circles) are formed from the combina- 
tion of primitive categorisers. 
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4.3.3 The Discrimination Game 

Here is a game, which I call the Discrimination Game, which is useful to study categorisa- 
tion in a systematic way. 13 The game is played by a single agent, randomly drawn from 
a population of agents, and is equivalent to the conceptualisation phase of the guess- 
ing game. The agent perceives the scene and chooses a topic from the possible objects 
segmented in the scene. He then uses his discrimination trees, as developed so far, to 
come up with a category or a conjunction of categories that is valid for the topic, but 
not for any other object in the context. The Discrimination Game succeeds if the agent 
has found distinctive categories, otherwise the game fails. When the game succeeds, the 
success counter of the categorisers involved go up. 




Figure 4.3: A computer generated scene from the geom world. 

I will now develop some concrete examples, which imply that I make choices for the 
kinds of sensory channels and scenes that the agents use. I will first use scenes from the 
geom world as in Figure 4.3 and later real world scenes captured with a Talking Heads 
camera. 

Consider the scene in Figure 4.3. Assuming that the agent already has a well-developed 
set of categories, he could use the category [gray-0.75,1.0] (very dark) to distinguish 
shape 3 from the others. On the other hand, if shape 2 is chosen as topic, the grayscale 
will not be enough because shape 2 has the same grayscale as shape 1. Maybe a combi- 
nation of categories can be chosen, like [vpos 0.0-05] (lower), and [hpos 0.0-0.5] (left). 
Indeed shape 2 is lower in the scene, as opposed to shape 1 and shape 3, and it is more 
to the left compared to shape 0 and 2. 



13 The Discrimination Game model together with the discrimination trees and its growth dynamics was pre- 
sented for the first time in Steels & Brooks (1995). Based on this paper a new implementation of single 
category discrimination was implemented by Angus McIntyre within the babel environment. Later on, 
Joris Van Looveren re -implemented the use of conjunctive combinations. 
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4.3.4 The Pachinko machine 

Any visitor to Japan sooner or later comes across a Pachinko hall where eager players 
sit before a machine in which a metal ball, inserted at the top, falls through a series of 
gates until it falls in a winning or a losing bin. These games are a possible metaphor to 
visualise the categorisation process based on discrimination trees. 

Imagine that for each object in the scene and for each sensory channel, there is a ball 
containing the value for that object on that channel. It is introduced in the top categoriser 
of the discrimination tree associated with that channel. For example, suppose that there 
are three objects in the scene: 01, 02, and 03, with gray-scale values 0.6, 0.4, and 0.9 
respectively. We can therefore imagine three balls labeled with these datavalues which 
are input to the top categoriser of the gray discrimination tree (Figure 4.4). A categoriser 
divides the balls in two bins, those that fall in the range of one category and those that 
fall into the range of the other category. In this case, the left bin contains {02} (category 
[gray 0.0— 0.5]) and the right bin {Ol, 03} (category [gray 0. 5-1.0]). 

A distinctive category is found when the ball of the topic is the only one left in one of 
the bins. If the topic is Ol, then this is not yet the case, because it is together with 03 in 
the bin of [gray 0.5-1. 0]. However when the next categoriser is exercised, it splits the 
set {Ol, 03} into two subsets: {01} for [gray 0.5-0.75] and {03} for [gray-0.75,1.0], [gray 
0.5-0.75] is a discriminating category because only Ol is left in its bin. 

Balls thus trickle down from the top to the bottom of a discrimination tree, like in the 
Pachinko game or a lottery machine. The trickling down process can stop as soon as a 
distinctive category is found because finer grained distinctions are not necessary. 




Figure 4.4: Balls containing the data value for the different objects Ol, 02, 03 in the scene. 

Ol is the topic. Ol has the value 0.6 on the GRAY-channel, 02 has 0.4 and 03 
0.9. 
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4.3.5 Competition between categories 

A realistic agent has hundreds of sensory channels, and humans probably have tens of 
thousands of them, probably grouped with respect to the domains to which they apply. 14 
There are discrimination trees for each of these channels or for combinations of them, 
and each can possibly yield a distinctive category. The categorisation process can there- 
fore be envisioned like a huge Pachinko hall, in which balls are trickling down in parallel 
in hundreds or thousands of machines. It is highly likely that more than one solution 
is found when there are a lot of trees, particularly if combinations of categories are al- 
lowed as well. So, an additional competitive process must take place to rank categories 
even though multiple solutions are offered to subsequent verbalisation processes. We 
see therefore the same characteristics as for the perceptual layer, and thus the “sieve 
architecture” also applies for the conceptual layer (Figure 3.5). 

There are many possible criteria for preferring one category over another, equally 
distinctive one. The first is based on simplicity. A single category is less complex than a 
combination of categories, and a more abstract category is preferred over a more specific 
one. The second criterion is based on success in earlier games. Each categoriser monitors 
how many times it was used and how many times it was successful, i.e. how many times 
it could distinguish the topic and participate in a successful language game. Another 
criterion, that I will bring in later once I have introduced the lexical layer, refers to 
success of the lexicalisations of the category. The agent will prefer categories where it is 
known that there is a well- accepted way to express it. When ranking categories, these 
criteria are combined and the best ones enter with the most force in the lexicalisation 
layer. 

Notice the hidden positive feedback effect between success and use: A categoriser 
which has already achieved a higher score wins the competition, everything else being 
equal, causing its score to increase even more. This way a consistent behaviour emerges 
where the same category tends to be used in the same circumstances, similar to the 
way a walking path sometimes emerges for crossing a patch of grass between buildings. 
Initially many paths are possible, but once one path is used a bit more than others, it 
gets used more and more, as people reuse a path they perceive to be there. I will show 
later (chapter 6) that this entrenchment of a particular solution by a positive feedback 
loop can be exploited through the structural coupling between the ontology (the set of 
categories) and the lexicon (the set of form-meaning pairs verbalising categories), so that 
they become co-ordinated without a central co-ordinator. 15 



14 This is strongly suggested by psychological data on the presence of conceptual spaces in human categori- 
sation. See Gardenfors (2000). 

15 1 adopt here other general principles of complex systems. The notion of structural coupling has been 
introduced by Maturana & Varela (1998) and is now widely used to explain various forms of biological 
co-ordination. 
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4.3.6 Variations on discrimination 

There are obviously many variations on categorisation that could be imagined. For ex- 
ample, a categoriser could make use of focal points instead of regions. A focal point is 
a single significant data value of a sensory channel. The categoriser then has to com- 
pute the distance between the value for a segment and the focal point of each possible 
category. The category whose focal point is closest to the sensory value of a segment 
applies. This implements a prototype-like approach to categorisation which has been ar- 
gued to be more realistic with respect to human categorisation. 16 For example, humans 
typically label light at 482 nanometres as the most typical blue, so that a given object 
reflecting light at or near this point is categorised as [blue]. Categories based on focal 
points are interesting and have clear advantages but I will stick nevertheless in the first 
instance to binary discrimination trees operating on single sensory channels to simplify 
the explanations and to analyse better what is going on. 

Still another way to categorise reality is by imposing an order on the segments based 
on their values for a particular sensory channel. For example, we can order the seg- 
ments based on the hpos channel (i.e. from left to right) and then introduce relational 
categories like left of one segment in the series, or the left-most object. Similarly we can 
order the segments based on the height channel (i.e. in terms of their size) and then 
have categories that select the smallest (i.e. the first segment in this ordering), or those 
greater than some other one. 

Of course, I am well aware that this categorisation process captures only the most 
basic way of generating meaning. Human beings make extended use of metaphor, anal- 
ogy, metonymy, and other processes that adapt conceptual structures from one domain 
to another one, see Johnson (1987). But before we can study such processes we must 
understand how basic perceptually grounded categories can originate. 

4.3.7 The Discrimination Game in action 

Let us now look at some example games for an agent al taken from simulations using 
computer generated scenes from the geom world and showing the internal structures 
generated as well as the reports from the commentator. At first I will not take saliency 
nor context-scaling into account. The first game (Game 8) fails. It takes place near the 
very beginning when al has practically no repertoire of distinctions yet. The scene and 
the discrimination trees available so far are shown in Figure 4.5. The object labeled 0 is 
the topic. Only two channels have top level categorisers: height and width. The data 
on these channels for the scene in Figure 4.5 are shown in Table 4.1. 

height and width have been scaled with respect to the minimum and maximum 
height and width of a figure (sensor-scaling) but no context-scaling has been performed. 

The game is reported by the commentator as follows: 



16 See Varela, Thomson & Rosch (1991) and Taylor (1995). 
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Table 4.1: Sensory data for the scene in Figure 4.5. 



Object 


HEIGHT 


WIDTH 


0 (square) 


0.413 


0.317 


1 (circle) 


0.410 


0.410 


2 (square) 


0.163 


0.163 




Figure 4.5: Top: The scene used in Game 8. Shape 0 is the topic. Bottom: The discrimina- 
tion trees available for this game. 



Game 8 

al segments the context in 3 objects: 
square-0, circle-1, square-2 
al chooses square-0 as the topic 
The discrimination game fails 

The game fails because for the two sensory channels for which there are discrimination 
trees, the values of the segments are all within the lower range and so no distinctive cat- 
egory or category set could be found. This failure stimulates the discrimination network 
to expand, but any node, including a top node of some of the other sensory channels can 
be chosen for further expansion. 

The next example shows a game (Game 22) based on the scene in Figure 4.6. The topic 
is the triangle, shape 0. The discrimination trees for height and width have already 
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Table 4.2: Sensory data for the scene shown in Figure 4.6. 



Object 


HPOS 


WIDTH 


HEIGHT 


0 (triangle) 


0.167 


0.437 


0.573 


1 (rectangle) 


0.789 


0.563 


0.287 



A B 



HPOS 

< 


VPOS 


HEIGHT 


WIDTH 

<< 


■GRAY 


RATIO 


AREA 





Figure 4.6: Top: The scene used in Game 22. Bottom: The discrimination trees available 
to the agent. 



more than one level and a discrimination tree for hpos has developed. The relevant 
sensor-scaled data for these three sensory channels is shown in Table 4.2. 

The scene is very simple so there are several possible solutions: The triangle is more 
to the left, it is less wide and taller. Each of these possibilities is discovered and their 
score (purely based on past performance) is looked up. The game is reported by the 
commentator as follows: 

Game 2 2 

al segments the scene in 2 objects: 
triangle-0, rectangle-1 
al chooses triangle-0 as the topic 

al categorises the topic as [HPOS 0. 0-0.5] (score 0.57), 
[HEIGHT 0.5-1. 0] (score 0.09), or 
[WIDTH 0. 0-0.5] (score 0.0) 

The discrimination game succeeds 
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4.3.8 The importance of scaling and saliency 

Game 22 shows at once why context-scaling and saliency is important. When we inspect 
the scene in Figure 4.6, we do not quite see so clearly that the triangle is less wide than 
the square, so why is [width 0.0-0.5] nevertheless considered? Examination of the data 
shows that the width values, 0.437 for the triangle and 0.563 for the square, are very 
close to each other, but just by luck fall within the two regions carved out by the width 
discrimination tree. On the other hand, the values on the hpos channel are much further 
apart and so they are preferred. 

As we have seen in the previous chapter, saliency is the smallest of the absolute values 
of the distance between the topic and any other object. It gives us an indication why a 
certain sensory channel should be preferred over another. For the scene in Game 22, the 
saliency for each channel with respect to the triangle is as in Table 4.3: 

Table 4.3: Sensory data for the scene in Game 22. 



HPOS 


WIDTH 


HEIGHT 


0.622 


0.125 


0.286 



From this table we see immediately that hpos is the most salient channel and should 
be preferred by far, followed by height and then width. Thus we can expect the agent 
to choose hpos based on saliency. When the saliency threshold is set to a reasonably 
high value, the other channels would not even be considered, they would not pass the 
sieve of the perceptual layer. 

The sensory channel data for the same scene now scaled for context is shown in Ta- 
ble 4.4. Such context-scaling pulls the data further apart and makes categorisation there- 
fore much easier and much more stable, but the information on saliency is lost and so it 
is no longer clear which channel is to be preferred for reasons of saliency. 

Table 4.4: Sensory data for the scene shown in Figure 4.6. 



Object 


HPOS 


WIDTH 


HEIGHT 


0 (triangle) 


0.0 


0.0 


1.0 


1 (rectangle) 


1.0 


1.0 


0.0 



So the best thing to do (and this is what the Talking Heads effectively do) is to first 
perform sensor-scaling, then compute saliency to determine which channel should be 
preferred, then perform context-scaling, to get clearly distinguished sensory values, and 
then do categorisation. Note that context-scaling has the same effect as using prototype- 
based categorisation because the actual values are pulled towards extremes, and thus 
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perceived as prototypes. Context-scaling is not always desirable. For example, in the 
case of colour categorisation the actual channel data should be maintained because here 
categorisation takes place on the basis of actual values. 

4.3.9 Combinations of categories 

The next game (Game 24) is based on the scene in Figure 4.7. The topic is triangle- 0. The 
discrimination trees are the same as for Game 22. The game (based on sensor-scaled 
values) succeeds with a conjunctive combination of two categories: 

Game 2 4 

al segments the scene in 4 objects: 

triangle-0, triangle-1, square-2, rectangle-3 
al chooses triangle-0 as topic 
al categorises the topic as 

{[HEIGHT 0. 0-0.5] [WIDTH 0. 5-1.0]} 

The discrimination game succeeds 




Figure 4.7: The scene used in Game 24. 

The relevant data after sensor-scaling for the two sensory channels involved is shown 
in Table 4.5. 

[height 0.0-0.5] is valid for triangle-0 and square-2 but filters out the other segments. 
[width 0.5-1.0] is valid for triangle-0 and triangle-1 and filters out the others. The con- 
junctive combination of these two categories only retains triangle-0 and is therefore the 
one that is chosen. 

4.3.10 A real world scene 

The next example is taken from a series of discrimination games played by physically 
instantiated agents using real world images. The series is discussed more extensively in 
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Table 4.5: Sensory data for the scene in Figure 4.7. 



Object 


HEIGHT 


WIDTH 


0 (triangle) 


0.170 


0.513 


1 (triangle) 


0.653 


0.570 


2 (square) 


0.213 


0.213 


3 (rectangle) 


0.613 


0.310 



Chapter 7. The agent a2 has captured the image shown in Figure 3.7 (top left) and done the 
necessary segmentation and gathering of sensory characteristics. The resulting sensory 
values (after sensor-scaling) for the segments are shown in Table 4.6. Object-O has been 
selected as the topic. 

Table 4.6: Sensory data from a real world scene with segmentation shown in Figure 3.7. 



channel 


obj-0 


obj-l 


Saliency 


HPOS 


0.27 


0.16 


0.11 


VPOS 


0.20 


0.20 


0.0 


HEIGHT 


0.15 


0.15 


0.0 


WIDTH 


0.10 


0.11 


0.01 


AREA 


0.10 


0.10 


0.0 


R 


0.23 


0.25 


0.02 


G 


0.32 


0.34 


0.02 


B 


0.63 


0.65 


0.02 



Clearly hpos is the most salient channel and should be preferred by the agent. When 
performing context scaling, the two values for hpos are drawn apart with 1.0 for object-0 
and 0.0 for object-1 so that the category [hpos 0. 5-1.0] (to the right) easily distinguishes 
the topic (object-0) from object-1. The game is reported as follows: 

Game 3 

a2 is the speaker, al is the hearer. 
a2 segments the context into 2 objects: 
object-0 object-1 
a2 chooses object-0 as the topic 
a2 categorises the topic as [HPOS 0.5-1. 0] 

This example illustrates well why the categorisation of the Talking Heads is so robust 
and why agents often share the same conceptualisation even if the details of their raw 
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perception is quite different. The saliency factor helps to focus the agents on those as- 
pects of the scene that stand out. There is an enormous reduction of variation, first by 
scaling then by the categorisation process itself. 



4.4 An ecology of distinctions 

The previous section introduced mechanisms that enable agents to find a distinctive cat- 
egory or conjunctive combination of categories given a set of segments and data on a 
series of sensory channels for each segment. I will now focus on the issue how discrim- 
ination networks and hence repertoires of possible categories may develop. 

4.4.1 Growth dynamics 

The process of growing categorisers is relatively straightforward. In the very beginning, 
the agent constructs top level categorisers for each channel which have contained at 
least once in the recent past relevant and distinctive data. If a channel has the same data 
for every possible segment it is obviously not going to be possible to find a distinctive 
category no matter how hard the agent tries. 

A new subcategoriser is constructed by taking a categoriser node in the tree and di- 
viding its range into two new subranges and thus two new subcategorisers. For example, 
if there is a categoriser [hpos 0. 0-0.5], which triggers when the object is in the left most 
half of a scene, i.e. with hpos within [0.0, 0.5], then two subcategories are created by 
dividing [0.0, 0.5] into two halves, one for the range [0.0,0.25] ([hpos 0.0-0.25] or totally 
left) and one for the range [0.25,0.5] ([hpos 0.25-0.5] or mid-left). A new categoriser is 
added to the tree for each of these halves. 

A categoriser based on a combination of categories is constructed by combining ex- 
isting categories into a new one. Of course, if done without limits, this could create 
potentially a combinatorial explosion of possibilities. In the current implementation, the 
construction of combinations is restricted by combining only those categories that have 
been partially successful in a given scene, just as only categories that have ever been 
relevant are expanded. 

There are two key parameters to the growth process: (1) which category should be 
expanded and (2) when should growth take place. In the Talking Heads experiment, 
agents expand a category which was effectively applied in the recent past, even though 
it may have failed in the game. This way the network is more likely to develop branches 
that are potentially relevant, although there is still no guarantee that the expansion gives 
the distinctions required for the case at hand, because it is not based on an in-depth 
analysis of the case. 

Growth rate is proportional to failure. The more failures occur, the higher the like- 
lihood of more nodes growing. This has the net effect of many new nodes growing in 
the beginning because there are many failures, but that the repertoire of categories sta- 
bilises once discriminatory success is steady. When the environment starts to change 
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again, causing new failures, more active growth is automatically triggered, which may 
lead to a renewed expansion of the repertoire. 

4.4.2 Pruning dynamics 

Growth needs to be balanced by pruning. Pruning simply means that a categoriser and 
thus its pending branches is cut away. There are again two issues: (1) which nodes should 
be pruned and (2) when pruning should take place. Obviously the score of a category 
should play a role in deciding whether it should be pruned, categorisers that have not 
been used very much or have a low success rate are prime candidates for removal, unless 
any of their subcategorisers has a high score. The monitoring of use and success already 
played a role in determining which category should be preferred, so this information is 
available to decide on pruning as well. 

Whereas the growth rate is proportional to failure, the pruning rate is made propor- 
tional to success, so that in the case of a high failure rate the new categorisers are given 
time to improve their score or to grow refinements that may be successful. A new cate- 
goriser obviously should be given a grace period to encounter enough cases to prove its 
worth, otherwise it could be cut out too quickly. Categorisers therefore not only monitor 
their use and success but also their age. 

4.4.3 Average discriminatory success and repertoire size 

The Discrimination Game is a dynamical system. 17 A repertoire of categories emerges 
in an agent gradually as an attractor of the growth and pruning dynamics coupled to 
the environment. If growth is strictly proportional to failure and there is no pruning, a 
point attractor is reached as soon as the repertoire is adequate, i.e. as soon as the agent 
consistently has success for all the possible cases it encountered. However, as soon as the 
environment or the sensory capabilities of the agent change, in other words when new 
types of figures appear or when new sensory channels become available to the agent, 
we expect that the repertoire of categories starts expanding again. This could be seen as 
an illustration of the assimilation-accommodation dynamics envisioned by Piaget. 

Let me introduce a few measures to test whether all this is really happening with the 
mechanisms introduced so far. The first (crucial) measure monitors how well the agent 
is doing by tracking the average success in the most recent n Discrimination Games. 
Figure 4.8 shows the outcome of this measure, for agent al, playing 500 Discrimination 
Games in a simulation with scenes from the geom world. Success is averaged per 25 
games. We see clearly that al has become successful in discriminating randomly cho- 
sen topics from a consecutive series of scenes. Success rapidly climbs and reaches 100%, 
even though the scenes are randomly generated combinations from a repertoire of fig- 
ures along continuously varying dimensions making for literally billions of possibilities. 
The Discrimination Game is successful because it does not try to detect invariants or 

17 The theory of complex dynamical systems, which is well developed in the natural sciences, provides the 

theoretical foundation for studying the Discrimination Game. For a general introduction, see Peitgen, 

Jurgens & Saupe (1992). 
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commonalities between consecutive cases but focuses on finding what is distinctive be- 
tween the topic and the other objects. This enables the agent to make such a gigantic 
abstraction leap and to make it at an amazingly rapid speed. 




Figure 4.8: The graph displays the average success per 25 Discrimination Games for a 
series of 500 games played by a single agent. Success climbs to 100%. The 
graph also displays the size of the agent’s repertoire of categories. Each scene 
contains between 3 and 6 objects. 

We can see whether a stable attractor has been reached by tracking the size of the 
repertoire, which is simply the number of catego risers in the agent’s discrimination trees. 
The result of this measure is also displayed in Figure 4.8. Once success is steady, the size 
of the repertoire remains constant, which means that no new elementary distinctions 
arise nor do any distinctions disappear. This is because there has been no pruning yet 
and growth is strictly proportional to failure. 

Figure 4.9 shows some snapshots of the evolution in the discrimination trees of al 
as the simulation continues and as additional situations arise. There are expansions, 
contractions, and shifts in the constitution of the discrimination trees but gradually there 
are fewer and fewer changes, compare for example (c) and (d), as a stable core emerges. 

These simulations show that the category formation process based on a growth and 
pruning dynamics is capable of creating a repertoire of discrimination trees adequate for 
distinguishing the topic from other objects in the scene. The simulations worked with 
computer-generated, stylised environments so that it is possible to probe the behaviour 
of the mechanisms and vary the complexity of the environment. Note that the mecha- 
nisms are neutral with respect to the type of channels supplied. The discrimination trees 
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Figure 4.9: Some snapshots of an evolving repertoire in a single agent using growth and 
pruning. Trees are shown after 500 games (a), 1000 games (b), 1500 games (c) 
and 2000 games (d). 



and growth and pruning dynamics can operate over auditory or bodily sensory channels, 
or other kinds of visual information that is produced by low level perception. 

4.4.4 Adaptivity in categorisation 

An agent operating in a real world environment is always going to be confronted with 
situations that he has not seen before. The growth and pruning dynamics of the Discrim- 
ination Game is capable of dealing with this because new distinctions grow when the 
failure rate is increasing. Here are the results of a computer simulation based on scenes 
generated by the geom world that test whether this is indeed the case. 

The simulation starts with a new virgin agent playing a series of Discrimination Games 
involving scenes which only contain rectangles of the same graylevel. The agent has 
only channels for height (0), width (1), ratio (between the actual area of the shape 
and the area of the bounding box), (2), gray (3) and area (4). We expect to see that 
the discrimination trees on the ratio (channel 2) and gray channels (channel 3) do not 
develop because the values on those channels are the same for all objects ever seen. 
This is clearly confirmed in Figure 4.10: The ratio between actual area and bounding box 
area is always 1.0 in the case of rectangles and they always have the same grayscale 
value. We see clearly that the ratio (channel 2) and gray (channel 3) do not develop 
and that the others develop to very fine levels of detail to still successfully discriminate. 
So discrimination trees only develop as needed in a particular environment, which is 
important for applying the selectionist principle to the generation of sensory channels. 
The categorisation process gives feedback to the sensory processing on the adequacy of 
particular sensory channels. 
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Figure 4.10: Two snapshots from a series of 500 discrimination games developing cate- 
gories to distinguish rectangles of the same average graylevel. Channel 2 
(the ratio channel) and channel 3 (the grayscale channel) do not develop. 



Let us now make the environment richer by letting the geom world also produce 
scenes with circles and triangles, as well as rectangles. If the discrimination process 
is adaptive, the ratio and gray channels should start to expand because these channels 
now contain significant data. This is indeed the case as shown in Figure 4.11. 




Figure 4.11: Two snapshots from an additional series of 500 Discrimination Games after 
the environment has become more complex, ratio (channel 2) and gray 
(channel 3) have started to develop. 

The simulation demonstrates that the proposed discrimination process is adaptive to 
changes in the environment, because growth picks up as soon as the environment poses 
new challenges, just like the immune system starts generating a larger repertoire (and 
expanding already existing antibodies that partially matched) when challenged by the in- 
vastion of foreign bodies. The adaptation can be tracked with the success and repertoire 
size measures introduced earlier. When these measures are collected for the example 
shown in Figure 4.12, phase one shows clearly that when only rectangles are present, a 
stable repertoire gradually develops and that the success rate reaches 100 % after about 
500 games. In phase two, when other types of shapes have been introduced, the discrimi- 
nation trees begin to expand again, now exploiting the ratio and gray channels to cope 
with the new types of objects. Existing categories will of course still be adequate for 
many cases. After 500 more games, a new equilibrium is reached. Steady discrimination 
success is seen with an enlarged repertoire. 
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Figure 4.12: Evolution of success and ontology size in a series of 1000 Discrimination 
Games played by a single agent. In phase 1, only rectangles of the same 
gray level are generated by the geom world. In phase 2, additional types of 
shapes are generated by the environment. 

4.4.5 Real world scenes 

Very similar developments can be seen when we do experiments with embodied agents, 
capturing real world scenes through their cameras. Figure 4.13 shows two snapshots of 
developing discrimination trees for two agents. The game discussed earlier, based on 
Figure 3.7 top, has been played with these trees, hpos is the most salient channel and 
a distinction can be made easily. Note that the height and width channels have not 
developed yet because no clear cases emerged in the environment where those channels 
provided salient data. 

4.5 Conclusions 

This chapter addressed the problem of how agents may categorise their environment 
using information on sensory channels about each segment in the scene. I argued in 
favour of a selectionist approach, which generates possible solutions in a relatively ran- 
dom growth process and tests them in the cases presented by the environment. This 
approach contrasts with instructionism, where the agent is assumed to make gradual 
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Figure 4.13: The discrimination trees developed by two physically embodied agents al 
(left) and a2 (right). The top of the figure shows the trees after playing 100 
games and bottom after 200 games. 



abstraction from a series of examples using induction, and with a rationalist approach, 
where perceptually grounded categories are assumed to be innate and hence derived 
through genetic evolution. 

The main conclusion of this chapter is that a selectionist approach to the origins of 
categories is theoretically and practically feasible. I have defined a growth and pruning 
dynamics which leads to an adequate repertoire for discriminating one object from the 
others in the same context and I showed that the repertoire is continuously adapted 
when the environment changes. 

We will have plenty of opportunity in later chapters to further test the mechanisms for 
categorisation presented here. I will also introduce additional feedback couplings from 
the lexical layer to these categorisation processes. Nevertheless we are now sufficiently 
advanced to be able to turn to the next subtask the agents face when engaging in a 
language game: establishing a relation between categories or combinations of categories 
and an utterance. 
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The semiotic square captures the four entities in a linguistic interaction: the referent, 
which is the object in the physical world that the speaker wants to communicate, the 
segmented image, which is the internal perception of the referent, the meaning, which 
is a category or combination of categories that picks out the referent in the present 
context, and the utterance, which is the word form or set of word forms transmitted 
by the speaker. In the previous chapters we looked at two sides of the semiotic square. 
The relation between the real world and the perceived image was studied in chapter 3 
and the relation between the perceived image and a conceptualisation that could act as 
the meaning for a language communication was studied in chapter 4. We now turn to 
the next side of the semiotic square: the relation between meaning and utterance. 

We need to find an architecture by which an agent can establish the relation between 
form and meaning, in other words verbalise a meaning to produce an utterance and 
parse an utterance to retrieve its meaning. This mechanism needs to be flexible enough 
to deal with the unavoidable synonymy and ambiguity that will arise. Second, we need 
to find a mechanism by which an agent can acquire and help construct the lexicon of the 
group. A shared lexicon should emerge through the distributed activities of the agents 
without any prior design or global co-ordination. Third, the lexicon formation process 
should scale up to handle a growing and ever changing set of meanings and continue 
to work even with large populations whose constitution changes in time. It should be 
possible for new agents to enter the population and acquire the existing language and for 
agents to leave without destabilising the whole system. These are formidable challenges, 
particularly because we want to find the simplest possible solution, something a one 
year old child could do without fully developed intelligence. 

Immediately we observe a major difficulty. In realistic language games, where agents 
cannot inspect each others’ brain states nor transmit meanings directly, there is no feed- 
back about the meaning of a word, only about the referent. For example, when a speaker 
says w abo and the hearer has correctly pointed to the referent that the speaker intended, 
neither the speaker nor the hearer can know whether they were using the same mean- 
ing. They can only know that they arrived at the same referent. When a game fails, the 
speaker can only point to the topic and the hearer then tries to figure out what possi- 
ble meaning could have been applicable. The speaker cannot communicate directly the 
“right” meaning, and very often more than one meaning is possible to distinguish a topic 
from other objects in the context, so that the hearer will not necessarily guess the mean- 
ing used by the speaker. I will call this the gavagai-problem, because Quine used this 
word to illustrate exactly this difficulty. 1 Quine evoked the problem of an anthropologist 



1 See Quine (1960: 29-30). 
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trying to figure out what a native speaking an unknown language might mean when he 
utters gavagai while pointing to a white rabbit scurrying by. 

In this chapter, I will bypass the gavagai-problem by assuming that agents get direct 
feedback about the meaning of a word. This is done by assuming that all agents share 
the same perception, that they already have a repertoire of shared meanings, that for 
every agent a particular meaning always picks out a single referent, and that a given 
referent is conceptualised with the same meaning by every agent. This scaffold allows 
us to focus on the problem of how form-meaning associations might form and propa- 
gate in a population without worrying how agents get feedback about the meanings of 
forms. However, it does means that we cannot do experiments with embodied physical 
agents but will have to accept the limitation of working only with computer simulations. 
The next chapter will take the scaffold away, as any serious theory for word meaning 
acquisition should. I will then show that given an appropriate coupling between lexical- 
isation and categorisation, a communication system can still get off the ground based on 
the mechanisms described in this chapter. 



5.1 Inventing a lexicon 

I will now introduce another game, the Naming Game, to allow us to focus on the origin 
of the lexicon. The Naming Game defines a situation requiring a group of distributed 
autonomous agents to develop and use a shared lexicon relating forms and meanings, 
assuming they have a shared repertoire of meanings and get direct feedback about what 
meaning corresponds to a certain form. The Naming Game can be thought of as the 
lexical side of the Guessing Game. 2 The game can be implemented with different mech- 
anisms compared to the ones I will use, so it defines a task setting in which different 
solutions can be compared. See also Hutchins & Hazlehurst (1995). A different task 
setting for studying language acquisition (but not how a language may emerge from 
scratch) is illustrated in Regier (1996). In this case, the agents are shown examples and 
counter-examples together with words they should use in each case. 

It can be objected that the Naming Game (and the Guessing Game) already assumes that 
the agents want to communicate. This is true, but the game can be embedded in a larger 
setting where communication is vital for survival. An example of such a setting is dis- 
cussed in Werner & Dyer (1991). 

Another issue concerns the evolution of the game itself. This topic is discussed in: Hur- 
ford (1989). This paper is also the earliest paper posing the problem of the origins of 
a lexicon through computational simulations. See also Oliphant (1996). Hauser (1996) 
contains a further discussion of these topics from the viewpoint of biological continuity. 

The Naming Game is played by two agents, a speaker and a hearer, which are picked 
randomly from a population. The speaker selects a meaning from the shared repertoire 
of meanings, looks up a possible word for this meaning in his lexicon, and transmits 
the word to the hearer. The hearer interprets the word by looking it up in his lexicon, 



2 The Naming Game and associated computer simulations were presented for the first time in Steels (1996). 
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and transmits the meaning he thus obtained. If this meaning is the one that the speaker 
originally had in mind the game succeeds, otherwise the game fails. When the game 
fails, the speaker communicates the meaning directly so that the hearer can acquire a 
new form-meaning pair for later conversations. When a speaker does not have a word 
yet for a meaning he wants to communicate, he may create a new one. 

5.1.1 Representing lexical associations 

What cognitive architecture do agents need to engage in naming games? Clearly they 
need some sort of associative memory to store their individual lexicons. Let us assume 
that agents can construct and recognise arbitrary consonant-vowel combinations, like 
coba or wabidu, for forming words, and that they have a repertoire of possible meanings 
in the form of categories, for example [left], [dark], [large], etc. The contents of the 
associative memory of a single agent can be displayed in a table as follows: 

Table 5.1: Associative memory of a single agent 



meaning 


form 


[dark] 


coba 


[large] 


wabidu 



As an agent is acquiring his lexicon, there are going to be stages when he is not yet sure 
about the meaning of a certain form. So it must be possible for the agent to store different 
meanings for the same form and different forms for the same meaning. This can easily 
be done by extending the memory capacity to cross-associate multiple items. Agents 
can then handle ambiguity (one word can have different meanings) and synonymy (one 
meaning can be associated with many different words). 

A speaker can only transmit a single choice for expressing the meaning. When there 
are alternative words for the same meaning in his lexicon, he must decide which one to 
use and this decision should be such that it maximises success in the game. To estimate 
this success, each agent should monitor for each form-meaning association how success- 
ful it has been, which could be implemented by associating with every form-meaning 
pair a score. The score of a form-meaning pair is specific to an agent and based only 
on his own local interactions with other agents, in line with the principle that no single 
agent has a complete overview of the lexicon nor controls the others. An example of 
a lexicon with multiple associations and a score for every association is illustrated in 
Table 5.2. 

From this table we see that the agent prefers to use the word pama for [dark] and 
limiri for [large]. 
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Table 5.2: Example lexicon with multiple associations 





coba 


zapo 


bila 


pama 


wabidu 


limiri 


[dark] 


0.3 


0.2 


0.1 


0.8 


- 


- 


[large] 


- 


- 


- 


0.5 


0.3 


0.6 



5.1.2 Updating the score 

One of the crucial aspects of the Naming Game model is how scores are updated based 
on the outcome of a game. Intuitively the score should be related to use and success. The 
more a word is used and the more success it has, the higher the score should be. Moreover 
there should be a time dimension, as recent use and success should obviously contribute 
more to the current score. The following is a scheme that captures these characteristics. 

Every time an agent successfully uses a form-meaning pair for speaking, he incre- 
ments the score with a specific increment S. S is relatively high, typically equal to 0.1. 
The scores of competing associations, i.e. associations that used another form for the 
same meaning are decremented with 5. The score remains however bounded between 
0.0 and 1.0. This way the “best” form-meaning pair stands out more clearly next time 
around. 3 

When an agent plays the role of hearer, he also increments the association that was 
successful with S, and decrements competing associations, i.e. associations that related 
another meaning to the same form. When a game fails, the associations used by the 
speaker and the hearer that contained the transmitted word form are both decremented. 

The operations of speaker and hearer are summarised in Figure 5.1 assuming that 
speaker and hearer use both the score table above (in reality they of course always 
have different score tables). The speaker collects all possible forms for a given mean- 
ing [dark], chooses the one with the highest score ( pama ), and transmits that form. 
The hearer collects all possible meanings for the transmitted form ([dark], [large]) and 
again chooses the one with the highest score. If the hearer’s meaning is equal to that 
of the speaker’s, the game succeeds. The score of two used associations increases and 
the others decrease, implementing lateral inhibition. If the game fails, only the two used 
associations go down. 4 

It is possible to impose an even stronger lateral inhibition, by assuming that in the 
case of success, the speaker decreases the score of all the associations that imply the 
word used in the game but with another meaning, and the hearer decreases the score of 

3 A systematic investigation of alternatives for the updating function is contained in Oliphant (1997). The 
dynamics of the mechanisms used in the Talking Heads experiments are being investigated in the Ph.D 
thesis of Frederic Kaplan. 

4 More or less neural realism can be introduced to model this associative memory. In our experiments we 
have a perfectly working memory that can store an association as soon as it has seen it once. This makes 
theoretical investigation easier and makes it possible to better follow the simulations and experiments. An 
example of a neural network solution to lexical memory is discussed in Cangelosi & Parisi (1996). 
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Figure 5.1: Score adjustments after a successful game. Scores of used associations go up 
and their competitors go down. 



all the associations that imply the same meaning but with another word. This obviously 
requires additional processing from the side of the agents. 

5.1.3 Constructing and acquiring words 

When virgin agents start playing naming games, their associative memories are com- 
pletely empty. Each agent needs two additional activities to get a lexicon emerging: 

• When an agent does not have a word for a meaning he wants to communicate, he is 
allowed to create a new word (by random combination of vowels and consonants) 
and add that to his lexicon. Agents are assumed to have a shared repertoire of 
syllables which they can all produce and recognise. This happens with a certain 
probability, the word creation rate w c . This rate reflects how “free” the agent 
feels to extend the lexicon. 

• When an agent hears a word he has never heard before, he may add this new 
word to his repertoire. Again this happens with a certain probability, the word 
absorption rate w a . This rate reflects the critical attitude with which an agent 
accepts the linguistic authority of other agents. 

These rates are not critical but must of course be positive. Experiments continue to work 
when agents always make a new word ( w c = 1) and always absorb the word of the other 

K = !)• 
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5.1.4 The Naming Game in action 

To become more familiar with the Naming Game, I will now go through a few examples 
of its application, assuming a group of five agents: A = {al,a2, a3, a4, a5} and five 
possible shared meanings: M = {[dark], [large], [light], [small], [red]}. 

Here is a trace of a first game as reported by the commentator, when the agents do not 
have any lexicon at all. 

Game 0 

a5 is the speaker. a3 is the hearer. 
a5 categorises the topic as [LIGHT] 
a5 does not have a word for [LIGHT] 

This trace lists the number of the game, the speaker, the hearer and the categorisation 
of the topic. The speaker did not have a word and did not create one (because the word 
creation probability is w c = 0.1): the game has failed. In the beginning most games fail 
if the word creation rate has been set to a low rate. 

In the next game shown below, the speaker is a4, the hearer a5 and the topic [small] . 
Now the speaker is successful in creating a word, namely di. The hearer receives the 
word, does not know it, but stores it in association with [small]. The game still fails. 

Game 2 9 

a4 is the speaker. a5 is the hearer. 
a4 categorises the topic as [SMALL] 
a4 creates a new word: di 
a5 does not know di 
a4 points to the topic 
a5 categorises the topic as [SMALL] 
a5 stores di as [SMALL] 

In game 32, something similar happens. This time a5 creates a new word pida for [large] . 
a3 does not know the word but stores it. 

Game 32 

a5 is the speaker. a3 is the hearer. 
a5 categorises the topic as [LARGE] 
a5 creates a new word: pida 
a5 says: pida 
a3 does not know pida 
a5 points to the topic 
a3 categorises the topic as [LARGE] 
a3 stores pida as [LARGE] 

A first success occurs in game 43, when a5 uses again pida for [large]. a3 hears pida , 
has associated it in his lexicon with [large], and so the game succeeds. 
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Game 43 



a5 


is the speaker. 


a3 is 


the 


hearer 


a5 


categorises the 


topic 


as 


[ LARGE ] 


a5 


says : pida 








a3 


interprets pida 


as [large; 


1 


a3 


points to the topic 






a5 


signals OK 









It is quite tedious to go through such games by hand. For large populations of agents or 
meanings, even the most diligent researcher soon loses patience. Fortunately it is not so 
difficult to implement the Naming Game model on a computer. This makes large-scale 
simulations, even with hundreds of agents and meanings feasible, and ensures that they 
have been done correctly. All traces and graphs of games reported in this book have been 
produced by computer simulations or physical experiments with the Talking Heads. 



5.1.5 Characterising the lexicon 

The individual lexicon of one agent, a5, after 100 games is shown in 5.3. 

Table 5.3: Associative memory of a single agent after 100 games. 



Meaning 


Word 


Score 


[large] 


pida 


0.20 




fobu 


0.1 


[light] 


gi 


0.0 


[small] 


di 


0.10 



Both the words di and pida are present but with weak scores. There are two synonyms 
for [large] : pida and fobu, but pida is preferred. There is a word gi who is available for 
[light] but does not have a positive score, because successive trials failed to yield a 
successful game. 

A table such as the one above only represents the lexicon of a single agent. It is highly 
unlikely that two agents share the same lexicon because each agent will have had dif- 
ferent encounters and hence different language experiences. A picture of the lexicon of 
the group from the viewpoint of an outside observer can be obtained by inspecting the 
internal states of each agent to construct the group lexicon. It groups the dominating 
meaning-form associations for all possible meanings and the frequency of each associa- 
tion. It gives a picture of “the” lexicon in the group. The group lexicon for the complete 
population of five agents after 50 language games is shown in Table 5.4. 

This reflects a situation where 40 % of the agents prefers to name [large] with pida. 60 
% of the agents use gi for [light], and 40 % di for [small]. The other meanings ([dark] 
and [red]) do not have names yet. 
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Table 5.4: Population lexicon after 50 games 



Meaning 


Word 


Frequency 


[large] 


pida 


0.40 


[light] 


gi 


0.60 


[small] 


di 


0.40 



Note that the group lexicon is not known by the agents and is not stored anywhere in 
the total system. The only information which is locally stored in each agent is his own 
lexicon, which might be quite different from that of the group lexicon. For example, the 
lexicon of a5 shown earlier is different from the group lexicon. a5’s score for gi is 0.0 
even though the word is already preferred by 60 % of the agents according to the group 
lexicon. The group lexicon is a macroscopic structure that we as observers construct 
from inspecting the internal states of the agents. 

Let us now continue the simulation. Here are two additional games showing how ga 
propagates from a4 to a2 in game 101 and from a2 to al in game 104. 

Game 104 

a4 is the speaker. a2 is the hearer. 
a4 categorises the topic as [RED] 
a4 says: ga 
a2 does not know ga 
a4 points to the topic 
a2 categorises the topic as [RED] 
a2 stores ga as [RED] 



Game 104 

a2 is the speaker, al is the hearer. 
a2 categorises the topic as [RED] 
al says: ga 
al does not know ga 
a2 points to the topic 
al categorises the topic as [RED] 
al stores ga as [RED] 



After a total of 250 games, the consensus is complete. The group lexicon is shown in 
Table 5.5. 

Once the agents have reached this stage, the lexicon does not change anymore, be- 
cause all agents now prefer the same word for each possible meaning and would never 
choose another one nor encounter another one. 

The lexicons of individual agents contain quite a few form-meaning pairs that did not 
make it in the shared lexicon that gradually emerged. It is entirely feasible to envision 
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Table 5.5: Consensus in group lexicon, reached after 250 games. 



meaning 


form 


frequency 


meaning 


form 


frequency 


[dark] 


go 


1.00 


[large] 


pida 


1.00 


[light] 


gi 


1.00 


[small] 


di 


1.00 


[red] 


g a 


1.00 


- 


- 


- 



a pruning mechanism that would eliminate from memory those form-meaning pairs 
whose success has been non-existent (for example because an agent created it as speaker 
but it was never picked up by anybody else), or whose score has become zero (because 
another word became dominant). The impact of such a forgetting function has not been 
explored yet in our experiments. 

5.1.6 Monitoring 

I use various measures both for actual lexicon use and for the coherence and evolution 
of the lexicons of the individuals to see better what is happening. The first and easiest 
measure is the average game success, also called the communicative success, of a 
population of agents A in a set of n language games. When this measure is graphed 
continuously for consecutive sets of games, the progress in the population towards suc- 
cessful communication can be followed easily. This is shown in Figure 5.2 which plots 
data from the games discussed in the previous paragraphs. We see at once that aver- 
age success climbs from a starting point of zero to a maximum of 1.0. This can only be 
because a shared lexicon emerged in the population. 

When the population (both of meanings and agents) is larger, one would expect that 
it takes longer to reach total average success. This is indeed the case (Figure 5.3). 

We see for example that for 20 agents and 20 meanings success climbs to total suc- 
cess after about 10,000 games. This is still surprisingly low particularly as success is are 
already above 95 % after about 5000 games. Games can be played in parallel by different 
agents because the system is entirely distributed. If we divide the number of agents by 
the number of games, we see that about 250 games are needed by the agents to get 95 % 
success, which means that every meaning needs to appear about 10 times for each agent. 
Interestingly enough, the larger the population of agents, the more the success curve 
approximates an S-shape, which has been observed empirically in the spreading of new 
linguistic conventions. The same curve shape is familiar to biologists studying models 
of competitive growth, suggesting a strong relationships between ecological dynamics 
and language spreading. 5 



5 The S-shaped curve is discussed in McMahon (1994) p. 52. For examples of biological models with similar 
properties, see May (1976). 
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Figure 5.2: The graph displays on the y-axis the average success every 10 games in a pop- 
ulation of five agents lexicalising five different meanings. The x-axis shows 
the number of games. Average success rapidly climbs until it reaches total 
success after about 180 games. 




Figure 5.3: The graph displays the evolution of communicative success for larger and 
larger populations. The number of games on the x-axis is divided by the num- 
ber of agents. 
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5.1.7 Measuring lexical coherence 

To monitor to what extent the agents share the same lexicon, I propose a second mea- 
sure, the lexical coherence. The lexical coherence is defined as the average of the 
frequencies of all the form-meaning pairs in the group lexicon. If all agents prefer the 
same form-meaning pair for all meanings, lexical coherence is 1.0. If they agree on none, 
it is 0.0. 

Consider the following group lexicon after 3000 games for the previous simulation 
(with 20 agents), shown in Table 5.6. 

Table 5.6: Group lexicon after 3000 games. 



meaning 


form 


frequency 


meaning 


form 


frequency 


[dark] 


dato 


1.00 


[large] 


biti 


0.80 


[light] 


pitu 


0.60 


[small] 


dopu 


1.00 


[red] 


gabi 


1.00 


[green] 


gu 


0.85 


[square] 


koti 


0.50 


[rectangle] 


totu 


0.65 


[left] 


toga 


0.90 


[blue] 


ku 


0.80 


[yellow] 


gubo 


0.55 


[charming] 


ge 


1.00 


[triangle] 


bu 


0.85 


[square] 


ba 


0.60 


[fast] 


beke 


1.00 


[slow] 


tu 


0.95 


[circle] 


ke 


0.75 


[right] 


gaba 


0.95 


[up] 


butu 


1.00 


[down] 


ki 


0.95 



The lexical coherence is at this point equal to 0.835. 

Lexical coherence can be graphed alongside average success (see Figure 5.4). As ex- 
pected, lexical coherence increases and we can see that as coherence increases the suc- 
cess rate increases. 

Does total success imply that all agents use the same lexicon? Not really. To have 
success, the hearer must associate the form used by the speaker with the same meaning. 
But it is not required that the hearer himself prefers to use the same form for the same 
meaning, synonyms may occur. A speaker of British English typically uses the word 
pavement, whereas an American prefers sidewalk, even though he understands pavement. 
Thus there can be several forms active in the same population, even though the outcome 
of a game is always successful. We will see later that synonyms do get damped, as is the 
case in human natural lexicons. 

Initially lexical coherence is higher than success, because a game fails if the hearer is 
acquiring the form-meaning association used by the speaker. So two agents could have 
stored the same association, and thus coherence would have increased, without already 
having enjoyed the benefit in a successful communication. However, once success is 
total, agents no longer make changes based on negative feedback from failure, simply 
because there is no failure, even the less common forms are understood correctly by 
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Figure 5.4: This figure shows the evolution of both average success and lexical coherence 
for a group of 20 agents and 20 meanings. Total lexical coherence climbs more 
slowly once the population has reached a high average success. 



everybody. Further progress towards more coherence is therefore only due to the fact 
that the more common forms occur more often so that their scores keep going up as they 
are used more. 



5.2 Scaling up 

The associative memory and the score updating introduced in the previous section ap- 
pears to allow a group of distributed agents to establish a shared repertoire of form- 
meaning pairs. Of course, I still need to show that this mechanism remains adequate 
when it is incorporated in a complete game, in which case there is no direct feedback 
about meaning. But before doing so, let us see whether the mechanisms are adequate 
from the viewpoint of scaling: Can they handle variation in the set of meanings to be 
expressed? Do they cope with a changing population? 

5.2.1 Coping with new meanings 

In natural languages, new meanings arise every day while other meanings become irrele- 
vant. For example, none of the terms used for talking about the Internet (e-mail, surfing, 
home page, etc.) would have made sense to anyone a few decades ago. On the other 
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hand, most of us now have lost many categories and concepts for classifying plants, sim- 
ply because they are no longer such a prominent part of our urbanised environments. 
It follows that a mechanism claiming to explain the origins and acquisition of a lexicon 
in a population of agents should cope with a fluctuating set of meanings as well. This 
property is moreover crucial in the Talking Heads experiment because new meanings 
will continuously arise as the agents encounter new situations in the environment. 

Because the Naming Game included ways to handle new meanings from the start, 
nothing should have to be changed to handle an increased set of meanings. Let us see 
whether the Naming Game indeed copes through the next simulation (see Figure 5.5), 
using arbitrary labeled meanings ([Ml], [M2], etc.). In a first phase, the system is closed 
and a shared lexicon emerges for the initial set of 20 meanings, as expected. The group’s 
lexicon is now as in Table 5.7. 

Table 5.7: Group lexicon after first phase. 



meaning 


form 


frequency 


meaning 


form 


frequency 


[Ml] 


gebo 


1.00 


[M2] 


goge 


1.00 


[M3] 


koto 


0.70 


[M4] 


da 


1.00 


[M5] 


peko 


1.00 


[M6] 


ki 


1.00 


[M7] 


gipe 


1.00 


[M8] 


kedo 


1.00 


[M9] 


do 


1.00 


[M10] 


gige 


1.00 


[Mil] 


Pi 


1.00 


[M12] 


bu 


1.00 


[M13] 


pa 


1.00 


[M14] 


kipa 


1.00 


[Ml 5] 


depi 


0.95 


[M16] 


pudi 


1.00 


[M17] 


tegi 


1.00 


[M18] 


ba 


0.90 


[M19] 


ko 


1.00 


[M20] 


guda 


1.00 



In phase 2, a relatively small meaning flux is introduced (one new meaning every 
1000 games). As can be seen from Figure 5.5, the population copes with the change. New 
words are created and propagate in the population. The following group lexicon shows 
that for newcomers like [M22] or [M25] a total consensus has emerged. Words for the 
latest new meanings, [M28] and [M29], still have low frequencies. 

Next (phase 3 in Figure 5.5) a much higher meaning flux is imposed (one new meaning 
every 100 games). Lexical coherence decreases and average success plummets. There is 
not enough time to propagate the new conventions in the group. Note that lexical co- 
herence drops slower than success when the lexicon disintegrates. Coherence is based 
on the average for all meanings, thus only the new ones are therefore affecting over- 
all coherence. Success drops more rapidly because of the high rate of failure of new 
meanings. 

The system restores itself when the flux of meaning is brought back to 1/1000 games 
(phase 4). Interestingly enough, coherence now increases slower than success. The in- 



117 




success / coherence 



5 The Naming Game 



Table 5.8: Group lexicon after second phase. 



meaning 


form 


frequency 


meaning 


form 


frequency 


[Ml] 


gebo 


1.00 


[M2] 


goge 


1.00 


[M3] 


koto 


1.00 


[M4] 


da 


1.00 


[MS] 


peko 


1.00 


[M6] 


ki 


1.00 


[M7] 


gipe 


1.00 


[M8] 


kedo 


1.00 


[M9] 


do 


1.00 


[M10] 


gige 


1.00 


[Mil] 


Pi 


1.00 


[M12] 


bu 


1.00 


[M13] 


pa 


1.00 


[M14] 


kipa 


1.00 


[M15] 


depi 


1.00 


[M16] 


pudi 


1.00 


[M17] 


tegi 


1.00 


[M18] 


ba 


1.00 


[M19] 


ko 


1.00 


[M20] 


guda 


1.00 


[M22] 


to 


1.00 


[M23] 


de 


0.85 


[M24] 


tabo 


0.95 


[M2 5] 


piku 


1.00 


[M26] 


ku 


1.00 


[M27] 


pugu 


1.00 


[M28] 


tete 


0.35 


[M29] 


todu 


0.35 




Figure 5.5: Both average success (every 100 games) and lexical coherence is shown in 
cases of an inflow of meanings for a population of 20 agents starting with 
20 meanings (phase 1). The inflow is 1/1000 in phase 2, 1/100 in phase 3 and 
1/1000 in phase 4. 
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stability caused by a rapid influx of new meanings has lead to many new forms for the 
same meanings. These synonyms now spread in the population and lead to a rapid in- 
crease in communicative success. Coherence climbs up more slowly because competing 
synonyms are only gradually weeded out, based on their frequency of use. 

We can conclude that the agent architecture manages to handle an influx of meaning, 
as long as the flux stays within certain bounds. 

5.2.2 Lexicon acquisition by virgin agents 

The next question we need to investigate is whether the mechanisms explain how a lex- 
icon, once it has formed, can be preserved from one generation to the next. This clearly 
happens in human populations. Although lexicons show profound change, large parts 
get preserved even over very large periods of time. Some linguists even claim that the 
roots of certain words still in use today go back to the very beginnings of language which 
is hypothesised to have been around 50,000 years ago, see Ruhlen (1994). A genetic so- 
lution, where the lexicon is stored in the genetic code and thus transmitted from parent 
to child, seems clearly out of the question. Nevertheless, a lot of the early work on com- 
putational modeling of language origins relied on a genetic approach for transmitting 
the lexicon, possibly with some additional adaptation. See for example: MacLennan & 
Burghardt (1993: 603-631). This approach sheds light on the issue how signaling sys- 
tems may evolve in animals but is not applicable to the transmission of human lexicons. 
The lexicon of human languages is too diverse and changes too quickly to allow genetic 
transmission. So lexicons must somehow be transmitted in a cultural process. 

It turns out that the agent architecture I introduced in the previous sections does not 
need to be changed at all to obtain a cultural transmission of a lexicon, illustrating the 
explanatory power of the model despite its simplicity. New virgin agents entering the 
group may occasionally create a new word, if they do not have one themselves, but if a 
particular set of words with particular meanings is already strongly entrenched in the 
population, these new words have a very low probability to survive. Instead, the virgin 
agents will adopt the words that they abundantly hear in their environment, and the 
score of these words goes up quickly. 

Here is a computer simulation testing whether this is indeed the case (see Figure 5.6). 
We begin with a population of 20 agents and let them develop a shared lexicon for 20 
meanings (phase 1). Then I add new virgin agents at regular time intervals, at a rate of 1 
every 1000 games (phase 2). A new agent has no knowledge of the existing lexicon and 
therefore must acquire the lexicon present in the rest of the group. Figure 5.6 (phase 2) 
shows that the population indeed copes. A new member initially causes some failures 
in communication, but he quickly picks up the lexicon of the community and success 
moves back up. The lexicon does not change, it is stable against minor perturbations. 

However, when the birth rate is increased to 1/100 games (phase 3) the population is 
less able to cope. Success stays relatively high (70 %), but there are too many new agents 
coming in too fast. The lexicon cannot spread sufficiently quickly to the new agents and 
therefore starts to disintegrate. In a final stage (phase 4), the birth rate is set again to 1 
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Figure 5.6: Evolution of communicative success with different birth-rates, starting from 
a population of 20 agents (phase 1). Next the birth rate is increased from 1 
new agent every 1000 games (phase 2) to 1 new agent every 100 games (phase 
3), and then to 1 every 50 games (phase 4). 



new agent every 50 games. The population is no longer able to cope with the influx of 
new members and disintegrates. If inflow is brought back to a lower rate, the population 
would again establish a shared lexicon. However, the lexicon is now a different one from 
the one that was established before. The dynamical process has moved from one stable 
lexical state to another one. 

We have seen earlier that the Naming Game scales up with respect to the size of 
possible meanings. Now we see that it scales up with respect to the size of the population. 
As long as the rate of influx is not too high, the population can keep expanding. The only 
constraining factor is that new agents must have sufficient opportunities to acquire the 
lexicon present in the group. 

5.2.3 Preservation in changing populations 

In human populations, there is not only an influx of new members but also an outflux. 
When somebody leaves the community knowledge about the lexicon should disappear 
as well. Nevertheless, a lexicon clearly gets preserved from one generation to the next, 
which implies that the know-how is distributed robustly over the agents. 

The next computer simulation tests whether this is also true in the Naming Game 
model. The simulation starts with a population of 20 agents who are left to develop a 
shared lexicon for 20 meanings (phase 1). Then an in- and out-flux is introduced (phase 
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2 in Figure 5.7) with one new virgin agent coming in and another agent leaving the pop- 
ulation every 1000 games. The new agent has to acquire the lexicon present in the group. 
Average success therefore dips but is quickly regained. In fact, the population can be 
completely renewed without affecting the lexicon at all. After 16000 games nine agents 
(50 %) have been replaced, but the lexicon has not changed. So, the Naming Game model 
not only explains the formation of a lexicon but also its transmission: This transmission 
is entirely cultural. New agents enter the population with no prior knowledge. 




Figure 5.7: Success and population size is shown for a series of 35,000 language games. 

The population starts with 20 agents and 20 meanings (phase 1). Then an 
influx and outflux is introduced at the rate of 1/1000 games (phase 2). The 
lexicon maintains itself. In phase 3 agents enter and leave at the rate of 1/100 
games. Success lowers. In phase 4 the rate of change is brought back to 1/1000 
games and success is regained. 

Can we increase the flux in the population indefinitely? This is examined in phase 3 
of Figure 5.7. In this phase a higher flux has been introduced. One agent is added and 
removed every 100 games. Success goes down, although it is still maintained at a high 
level. The lexicon is still not changing. However, previous examples have already shown 
us that if we continue to increase the rate, the lexicon would disintegrate. Too many 
new agents would be flowing in, who do not have a lexicon yet. On the other hand, if 
we bring the rate of change back down to 1/1000 games (phase 4 in figure 5.7), success 
regenerates. 

These simulation illustrates how we can study lexicon transmission using a language 
game approach. We have to set up an in- and outflow of the agents and study the impact 
on their communicative success and their lexicon. In principle, we should not have to 
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change the architecture of the individual agents, and indeed I have not done so. Language 
acquisition is such an integral part of language use that a realistic agent architecture 
must intimately integrate both capacities from the start. Of course, at this stage we have 
only tested this with the agents getting direct feedback about meaning, we still must that 
whether it will continue to work with the physically instantiated Talking Heads. 



5.3 Self-organisation 

These various simulations show that the Naming Game embodies robust mechanisms 
for the emergence of a lexicon and we will use it as a core component for the Talking 
Heads experiment. In retrospect, the following mechanisms are crucial for the success 
of the model: 

1. Agents must be able to represent multiple associations (one form can be associated 
with many meanings and one meaning with many forms). Multiple associations 
naturally arise in a population of distributed agents because an agent may create 
a new form not knowing that one already exists in the population, or guess a 
different meaning for a form than the one intended by the speaker. I will discuss 
such examples in more detail later. 

2. An agent must be able to record a score for each association. The score is necessary 
for the agent to decide which meaning or which form should be preferred in a 
particular interaction. When random choices are made lexicons do not converge. 

3. Agents must be able to create new words when no words are available yet. When 
there is a fixed set of words, the problem is much harder and the distributed search 
process may get stuck into local minima. Lexical systems must be able to cope with 
a steady influx of new meanings so restricting the set of words from the beginning 
would be odd. 

4. Agents must perform lateral inhibition, which means that they must decrement 
the score of competitors to the form-meaning pair which won a competition. This 
is necessary to achieve convergence. 

5. Agents must get feedback in the case of failure. At the moment the feedback is 
direct, but I will soon embed the naming game into a more complex guessing game 
in which feedback comes from the externally observed outcome of the language 
game as opposed to the direct transmission of the intended meaning. 

When any of these characteristics of the agent architecture or the game are eliminated, 
the system does not work. Communicative success does not climb, convergence will not 
go beyond a small percentage, the size of the lexicon explodes, and so on. The fact that 
these architectural properties are crucial and non-trivial to discover strongly suggests 
that similar mechanisms must be in place in the emergence of human lexicons. 
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It is also important to stress what is not in the model. The mechanisms used by the 
agents are deliberately kept as simple as possible. Complexity should arise only from the 
enactment of simple construction rules. Agents do not go through complex reasoning 
about words, they simply store the new associations they encounter and rely on the 
updating processes to weed out wrong hypotheses. 

5.3.1 Winner-take-all processes 

The most remarkable and at first mysterious property of the Naming Game model is that 
the agents somehow reach a consensus without any central supervisor. They do not do 
this by having a general overview or by changing their internal parameters so as to be- 
come more conservative as the lexicon solidifies. It is solely due to the subtle interaction 
between language use, which gradually becomes uniform, and each agent’s adaptation 
to the language heard in the environment. If a certain word comes to be preferred by 
a group of agents for a certain meaning, its frequency of use goes up so that others en- 
counter this word more often and hence their scores for that word continue to increase 
as well. The more agents use a word, the higher its chance of success and the more it 
will be used. This effect is still enforced by lateral inhibition. The scores of competing 
associations decrease, making it less likely that they will win in the future. This pos- 
itive feedback therefore introduces an autocatalytic (self-enforcing) effect until the 
population locks into an equilibrium state. 6 

To follow better how a consensus gradually emerges, I will visualise the competition 
between different words for the same meaning in a meaning-form (MF) competition 
diagram, such as the one in Figure 5.8, which monitors the frequencies of the different 
forms in use for one meaning. The diagram shows clearly the struggle between different 
forms until one form ( pe ) emerges as the winner. When we later study grounded lexicon 
formation processes, we will see that the competition becomes much more complex and 
the whole system is in constant evolution. A form-meaning pair which is dominating 
may become weaker because its meaning fails to pick up the right referent in a new 
context. This in turn may trigger the creation of new words or the resurgence of existing 
words. 

5.3.2 Collective behaviour and self-organisation 

Biology is full of examples where structures spontaneously self-organise from the unco- 
ordinated activity of distributed elements through a winner-take-all process. Each time 
the same basic components as in the Naming Game model are seen: Random behaviour 
creates various possibilities and the reinforcement of some of these variations through 
positive feedback creates an autocatalytic effect. Perhaps the clearest examples can be 
found in the collective behaviour of social insects, such as the formation of nests by 
termites, although beautiful explanations have also been reported for the formation of 



6 Such positive feedback loops and the stability criteria associated with them have been widely studied in 
non-linear dynamical systems and applied to chemical and biological processes. See Babloyantz (1986). 
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Figure 5.8: Simulation with a population of 20 agents. The meaning-form competition 
diagram shows the frequency of all competing forms for a single meaning. 
We see a winner-take-all situation with one word ( pe ) dominating. 



patterns on sea shells, the growth of cell tissue, the aggregation of individual cellular 
slime mold amoebae into a slug, the flocking and collective movement of birds or mam- 
mals, etc., Meinhardt (1982). A classical example for collective behaviour, first developed 
by Jean-Louis Deneubourg, is the formation of paths in an ant society through mass re- 
cruitment, Pasteels & Deneuborg (1987). 

When ants carry food or other materials, they organise themselves in a chain which is 
typically the shortest path between the source and the nest. These chains can sometimes 
be surprisingly long (20 meters is quite common for European ants) and are maintained 
as long as the food supply lasts. The whole process has many intriguing properties. First 
of all, there is no central planning agency that regulates which food sources are to be 
explored. The coherence and co-ordination between hundreds or sometimes thousands 
of ants is established in a completely distributed fashion. There is no dependence on 
individual ants. Ants can be removed from a path or new ones can be introduced ran- 
domly without too much interference for the stability of the path as a whole. The paths 
are robust. If objects are put in the way or if the path is destroyed, the ants manage 
to reestablish it in a relatively short time span. The paths are adaptive. If the food sup- 
ply terminates, the path disintegrates and a new path will appear linking the ants to an 
alternative food source. 

We see here many of the properties found in natural languages and integrated in the 
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Naming Game model: absence of central planning, no critical dependence on a single 
element, resilience to influx or outflux of elements, and adaptation to changing circum- 
stances. Even more interestingly, the ants manage to establish these dynamic paths by 
a process which is similar to the language formation process used in the Naming Game, 
namely a positive feedback loop having an autocatalytic effect. An individual ant ap- 
pears to move around in a random fashion while searching for a source of food. When 
a food source is discovered, the ant returns to the nest using a global landmark like the 
sun. The food-carrying ant also deposits a chemical substance known as a pheromone 
as he travels back to the nest. This pheromone influences the otherwise random move- 
ment of the other ants, in that ants are attracted by the pheromone. Thus more ants are 
drawn to the path and hence led to the food source. As these ants in turn go back to the 
nest they also deposit pheromone. This gives the self-enforcing, autocatalytic effect: The 
more ants are on the path, the more pheromone is deposited, and therefore the stronger 
the attraction to the other ants. Very soon all the ants which were sufficiently close to 
the path form a chain. There is no central planning agency needed and the whole system 
does not depend on an individual ant. The order is emergent. 

These simple mechanisms also explain other features of the process. When the food 
source is depleted, the ants going back no longer deposit pheromone. And because the 
pheromone is a chemical that evaporates, it will soon have disappeared and consequently 
the ants will return to a random movement. When a path is interrupted because obstacles 
are put in the way or because the pheromone is temporarily removed by an experimenter, 
the ants resort back to a random movement. This introduces a random search process 
which will eventually lead to the discovery of a connection and the reestablishment of 
the path. When two ants find two food sources one closer than the other, the society will 
go for the closest source. Not because they exchange sophisticated signals but because 
the trail leading to the closest source will be amplified faster. Adaptivity is explained in 
terms of errors in following the path. Although ants are attracted to the pheromone, the 
attraction is only partial and very often (how often depends on the species) they will go 
astray. This sloppiness is however a source of new discoveries. When a lost ant hits upon 
a new food source the path formed by the whole society may gradually shift particularly 
if it is more abundant. 

5.3.3 Increasing-returns economics 

Self-organisation is not unique to biological phenomena, on the contrary, similar situa- 
tions have been intensively studied in economics, where the complex adaptive systems 
paradigm has recently also led to many interesting new insights, as discussed in Arthur 
(1996). For certain types of products, particularly in information technology where the 
cost of manufacturing and distribution is neglectible compared to the cost of design, a 
winner-take-all situation can be observed. One product, for example a particular operat- 
ing system or a particular microprocessor, comes to dominate the market. 

Brian Arthur and others have analysed these economic situations and identified a 
positive feedback loop as being the ultimate cause. The more customers choose a product, 
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the more others are attracted to it, particularly because other suppliers develop useful 
derivative products. Prices can be decreased keeping newcomers out of the market and 
customers get locked in, unable to move to other suppliers because they have invested 
too much and became dependent. For the companies who manage to manoeuver their 
products in such a situation there is a bonanza of increasing returns. This contrasts with 
the decreasing returns familiar from traditional equilibrium economics, where there is 
a damping of profits due to proliferating production and distribution cost as a product’s 
market share increases. 

5.3.4 Lessons from nature 

The analogies between self-organisation in language and other fields is important for 
three reasons. First of all, if self-organisation is ubiquitous in nature and has successfully 
explained so many phenomena, its incorporation into a model of language becomes inde- 
pendently motivated, and therefore the explanatory force of the model increases. What 
is new and different is that the principle is applied to a non-material self-organising 
entity, but nevertheless the same sort of dynamics can be seen. 

Second, the large arsenal of mathematical tools and analysis techniques developed 
in the sciences of complexity over the past decades can be carried over to the study of 
language. For example, the mathematical models of economists like Arthur or biologists 
like Deneubourg help us develop mathematical explanations why language reaches co- 
herence if autocatalysis is present. 

Third it suggests many aspects of the mechanisms which might be relevant for lan- 
guage. For example, the errors ants make in following a trail allow them to discover 
occasionally better food sources. Could such stochasticity also play a role in the adap- 
tive capabilities of language? The chaotic regime seen in many natural systems is known 
to be a source of new order (Kaneko 1996). Could language innovation also be explained 
that way? In other words, is it possible that language may occasionally exhibit a chaotic 
dynamics out of which new order emerges? 



5.4 Lexical dynamics 

This discussion begins to illustrate a major theme of the present book, namely that lan- 
guage as a macroscopic phenomenon can be viewed as a complex adaptive system with 
the same characteristics as other complex adaptive systems. 

It is well known that the dynamics of language change are related to the dynamics 
of the underlying population. Basically we can see two phenomena. On the one hand, 
human populations are not fixed for ever. New children without any knowledge of the 
language are born and other members die, taking knowledge about the language away 
with them. Populations renew at a certain rate which is known to have a significant 
impact on language. If the population changes quickly a language evolves more quickly 
and subsystems may even destabilise. For example, linguists have argued that English 



126 




5.4 Lexical dynamics 



lost its case system due to the Black Plague which decimated the population so that there 
was not enough opportunity for children to acquire the existing conventions. 

Second, human populations mix. Throughout the history of mankind there have been 
migrations or intense contact between geographically diverse groups. This again impacts 
then the languages of the groups. When a given population splits into groups that have 
no longer contact, their languages start to deviate. Conversely when there is an intense 
and prolonged contact between languages, structures from one language get adopted 
by the other and vice-versa. The degree of adoption depends on which group is dom- 
inant. Sometimes groups adopt another lexicon while retaining their own syntax, and 
sometimes they take over the syntax while retaining their own lexicon. 7 

These phenomena are fascinating and interesting from the viewpoint of language evo- 
lution, and may even explain some of the characteristics of human languages. Linguistic 
systems must be such that they can be transmitted from one generation to the next, oth- 
erwise they will not survive. In the Talking Heads experiment, new agents may enter 
into the group at any time and agents are geographically distributed. The local inter- 
actions with humans at a particular site, which is a kind of language contact between 
human and artificial populations, may impact the evolving lexicons and ontologies. 

5.4.1 Spatially distributed naming games 

Language game models provide us with new fantastic tools to study language transmis- 
sion and language contact: We can introduce a particular dynamics in the population in 
a controlled way and then study the impact on the dynamics of the language itself. I now 
focus on such a model to investigate the impact of the migratory dynamics of a popula- 
tion on the dynamics of language. We can introduce a two-dimensional grid and assign 
every agent randomly a position on this grid. The position assignment can be modulated 
so that the agents form clusters (Figure 5.9). Such a population structure can be thought 
of as a geographical distribution in space but might as well represent a social, genetic 
or economical structure. We could even envision models integrating several of these 
alternate dimensions. The physical Talking Heads network connecting installations in 
different geographical locations allows to do these experiments for real. 

In earlier simulations agents were randomly picked out of the population. Now we 
can base the probability with which two agents interact on their respective distance and 
on an interaction factor, which determines the weight of the distance. If the interaction 
factor increases, the role of distance becomes more important and interactions tend to 
reflect the spatial clustering more. Based on this parameter we can study the evolution 
of language when communications between clusters of agents increase. 

Initially we let each subgroup evolve towards a shared communication. Success is 
never total because there are occasional interactions with members of other communi- 
ties, however inner-cluster communication reaches total success. However inspection of 
the agent lexicons reveals that agents will develop a stable language within their cluster, 



7 A representative example of empirical investigations into language dynamics is contained in Nichols (1992). 
See also Romaine (1988). 
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Figure 5.9: The figure shows the spatial distribution of a set of 20 agents. There is clus- 
tering around three centers. 



but also a second language, an interlingua, which is weaker but shared among the dif- 
ferent clusters. This interlingua will become stronger as more agents interact between 
clusters. Thus we observe language diversity due to the spatial distribution but at the 
same time the rise of an interlingua. 

The following vocabularies illustrate this point clearly. The first vocabulary is taken 
from an agent from the leftmost cluster in Figure 5.9. All the words associated with a 
particular meaning are shown together with their score. 

{ } [ MO ] : kube[0.88] gutida[0.00] moko[0.00] 

{ } [Ml ] : nugini[0.97] gi[0.83] majiba[0.00] 

{} [M2] : go [ 0 . 98 ] ta[0.00] 

{ } [M3 ] : moma[0.98] ti[0.00] nudo[0.00] kene[0.00] 

{ } [M4 ] : nebu [0.98] me[0.83] 

{} [M5 ] : tine [ 0 . 98 ] 



{ } [M6 ] : bo [0.94] babige[0.82] 

{ } [M7 ] : mepabo [0.97] jabeto[0.71] di[0.00] 

{ } [M8 ] : kude [ 0 . 90 ] nado[0.00] 

{} [M9] : pe [ 0 . 94 ] da[0.00] 

{}[M10]: na[0.94] nuguge[0.90] pa[0.80] ne[0.00] 



{} [Mil] : 
{} [M12] : 
{} [M13] : 
{} [M14] : 
{} [M15] : 



mu[0.98] gite[0.00] paku[0.00] 
gema [0.96] do [0.33] gapu [0.00] 
j a [ 0 . 67 ] jo [ 0 . 00 ] 

dodine[0.88] pibo[0.83] gije[0.00] pupeto[0.00] 
jiti[0.94] gato[0.64] 



{ } [Ml 6] : bimogu [0.98] ba[0.00] 

{ } [M17 ] : bapi [0.96] ki [0.81] damuti [ 1 , 0 , 0 . 00 ] 
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{ } [Ml 8 ] : kutume [0.94] bu [0.00] ni[0.00] 

{ } [Ml 9 ] : mugu [0.95] NINU[0.50] pi[0.43] ji[0.00] tu[0.00] 

This is the vocabulary for an agent taken from the rightmost cluster in Figure 5.9: 

{ } [MO ] : gutida [0.79] kube[0.00] 

|}[M1]: gi[0.89] matu[0.85] pumoni[0.00] 

{} [M2] : go [ 0 . 95 ] ta[0.20] 

{}[M3]: kene[0.89] moma[0.82] nudo[0.00] koko[0.00] 

|}[M4]: nebu[0.97] me[0.00] bukugo[0.00] 

{ } [M5 ] : tine [0 . 90] 

{ } [M6] : babige [0.93] bo[0.00] 

{}[M7]: mepabo [ 0 . 90 ] junipe[0.75] di[0.00] 

{ } [M8 ] : nado [0.96] kude[0.80] puto[0.00] 

{ } [M9] : da [ 0 . 8 6 ] nine[0.71] 

{ } [Ml 0 ] : pa [0.87] 

{} [Mil] : gite [0.88] 

{}[M12]: gema[0.90] gapu[0.87] 

{ } [M13 ] : jo [ 0 . 96] j i [ 0 - 8 9] ja[0.00] 

|}[M14]: dodine[0.96] pupeto[0.56] pibo[0.00] gije[0.00] 

{ } [M15 ] : j it i [ 0 . 97 ] gato[0.46] 

{}[M16]: bimogu[0.97] ba[0.83] pipebe[0.00] 

{ } [Ml 7 ] : ki [0.97] bapi[0.00] ke[,0.00] 

{}[M18]: ni[0.94] kutume[0.80] moko[0.80] mekami[0.00] 
{}[M19]: ninu[0.81] mugu[0.00] pi[0.00] 

Some words (for example tine for [M5] or go for [M2]) are shared. But generally there are 
at least two words. One word is used preferentially inside the cluster, the other is known 
but preferentially used by members of another cluster. Thus the word mugu for [M19] is 
preferred for the first object in the first cluster and known but not preferentially used by 
the agent in the second cluster. Conversely, ninu is preferred for the same meaning by 
the agent in the second cluster, although he also knows mugu. mepabo for [M7] belongs 
to the interlingua. Both agents know and use it but they have a strong alternative jabeto 
for the first and junipe for the second agent. 

5.4.2 Language contact 

When the interaction factor increases, we see further differentiation because there is less 
communication between clusters. When it is decreased, we see more coherence because 
there is more intercluster communication. Thus we can effectively tune divergence or 
convergence in the simulations based on the probability of interaction between commu- 
nities (clusters) of agents. The effect of increased language contact and hence conver- 
gence is demonstrated in Figure 5.10. The simulation starts from the situation described 
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earlier with three clusters of agents that have each evolved a lexicon. The interaction be- 
tween clusters is initially very weak. At some point (after 4000 games) the intercluster 
communication is increased drastically. At first there is a drop in communicative success 
but then total communicative success is again reached. 




Figure 5.10: Evolution of average communicative success per 25 games in a group of 
agents with first weak (phase 1) and then strong interactions (phase 2). 

However, this general evolution hides the more interesting developments. Figure 5.11 
shows the evolution of coherence for each cluster (a, b, c) separately and also for the 
total set of agents. As long as the agents have relatively little contact, total coherence 
is low although the lexical coherence within each cluster is high. Total coherence starts 
to increase with increased contact. Coherence in each cluster diminishes somewhat be- 
cause the agents in the cluster are in the process of accommodating to the global lexicon. 
This means that the languages of the different groups are in the process of merging due 
to the increased language contact. 

Simulations show that, just as in human languages, increased contact causes at first 
a rapid increase in bilingualism, then a gradual mixing of the languages, and, if the 
contact continues, an evolution towards complete coherence. The more rapid the contact 
is increased, the faster the three phases can be observed. 
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Figure 5.11: Evolution of coherence in the total population and in the individual clusters 
is shown on the same scale as the previous figure. When contact is increased 
(phase 2), global coherence begins to rise steadily. 



5.5 Conclusions 

A population of agents following a simple set of behaviour rules and using an associative 
memory can give rise to a shared repertoire of form-meaning associations, giving the 
agents a total average success in communication. Once a shared repertoire comes into 
existence, it locks into an equilibrium state and gets transmitted from one generation to 
the next in a cultural process, as long as the rate of population change is not too high. 
The population can also cope with an in- and outflux of meanings, in the sense that the 
lexicon constracts or expands in relation to the demands from an evolving set of possible 
meanings. 

The mechanisms I have proposed here for the Naming Game are remarkable in many 
ways. It clearly shows that a shared set of conventions can arise without an omniscient 
central co-ordinator and without any prior knowledge of the lexicon built into the agents. 
The Naming Game also demonstrates a new way to model and thus investigate linguis- 
tic phenomena. Existing formal models of language, such as generative grammars, only 
model static competence of a single idealised speaker in a homogeneous language com- 
munity. Using the framework of language games played by populations of agents, we 
can model the emergence and evolution of language in an inhomogeneous community 
and study language use as well as change through language contact. 
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The Naming Game is a minimal model of communication between agents and far re- 
moved from the full complexity of human natural language. Moreover we made a num- 
ber of simplifying assumptions, thus putting up scaffolds to construct this initial model. 
The most important assumption was that the meaning of a word can be unambiguously 
known by the speaker and hearer independently of language. This assumption is of 
course not valid for human beings, and neither is it valid for the Talking Heads. 
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The problem how a physically embodied situated agent might refer to objects using lan- 
guage is extraordinarily difficult. If we then further want to find out how a group of such 
agents might autonomously bootstrap a language system, the task seems almost unsur- 
mountable. That is why I proposed earlier on to start by dividing this task into its three 
main subtasks along the lines of the semiotic square (Figure 6.1). The previous chapters 
each focused on one of these tasks. Chapter 5 has introduced perceptual mechanisms 
to process the raw image, segment the scene, derive characteristics about each segment, 
and give feedback by pointing to the referent. Chapter 4 studied categorisation mecha- 
nisms needed for conceptualising a scene and thus for generating the possible meanings 
of a verbal communication. Chapter 5 looked at how agents can lexicalise meanings and 
build up a sufficiently shared lexicon to engage in verbal interactions. 

Given that we now have reasonable solutions for these basic processes, at least for its 
most simple instantiations, we can now start to put the pieces together and thus study 
the complete guessing game. I will proceed in two steps. First I will but the lexical layer 
and the conceptualisation layer together in this chapter, and then I will ground the whole 
system by coupling the conceptualisation layer to the perceptual layer in Chapter 7. 

Another technique I proposed earlier on for handling the enormous challenges ad- 
dressed in this book, is to scale up gradually. I will follow this strategy as well. In this 
chapter, I will assume that the referent and the perceived image are the same. This im- 
plies that we are really dealing with semiotic triangles as opposed to semiotic squares 
(Figure 6.1). I will start simulations with only 2 agents and then scale up to a larger 
number. This increases the degree of synonymy in the lexicon. I will furthermore start 
by letting the agents consider only the most salient channel, so that they much more 
easily guess the same category for the same scene. Then I will scale this up so that the 
agents now consider more sensory channels and hence more categories. This increases 
the degree of ambiguity in the lexicon. Both synonymy and ambiguity are sources of 
incoherence and we will have to make sure that agents still manage to be successful 
despite of these. 

This chapter shows that agents still manage to bootstrap a shared lexicon due to care- 
fully established feedback couplings between the different processing layers introduced 
in the previous chapters. The language game gives feedback to the lexical layer so that 
words become preferred that are understood by others. The lexical layer gives feedback 
to the conceptual layer so that categories become preferred that have been successfully 
lexicalised. Each layer is a selectionist system that generates possible ways to solve a 
subproblem, of which some are kept and others discarded based on feedback of their use. 
I will examine in this chapter whether these couplings indeed cause a coordination of 
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the different internal layers in a single agent and whether they lead to shared ontologies 
and lexicons. 



6.1 Defining the Guessing Game 

The guessing game was already introduced in Chapter 2. Here is a first example game, 
game 500. a2 plays the role of speaker and al the role of hearer. The game is about the 
scene in Figure 6.2. The topic is the rectangle labeled 1. The grayscale channel is the most 
salient channel. The different sensory values (after sensor-scaling) for the segments in 
Figure 6.2 are shown in Table 6.1. The last line shows the saliency of the topic segment 
1. Clearly the grayscale channel is the most salient. 

Table 6.1: Sensory data for the scene shown in Figure 6.2. 



obj 


HPOS 


VPOS 


HEIGHT 


WIDTH 


GRAY 


AREA 


0 


0.66 


0.95 


0.01 


0.71 


0.19 


0.27 


1 


0.69 


0.83 


0.07 


0.33 


0.97 


0.21 


2 


0.99 


0.87 


0.54 


0.72 


0.22 


0.57 


saliency 


0.03 


0.05 


0.07 


0.39 


0.75 


0.06 



All rectangles are relatively close to each other and have more or less the same height 
and width. But the grayscale is clearly the more salient because rectangle-1 is much 
darker than the others. I assume that there are only two agents in the population and 
that they always use only the most salient to conceptualise the scene. The speaker and 
hearer have to traverse only two sides of the semiotic square (Figure 6.1) because we 
assume that perceived image and object being referred to are identical for both agents. 

Image segment == Referent 




Figure 6.1: The semiotic square becomes a triangle when the perceived image and the 
referent in the real world are assumed to be identical. 
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6.1.1 Example of a coupled game 

The speaker first plays a Discrimination Game traversing the semantic side of the square 
going from the referent rectangle-1 to a possible meaning [gray 0.5-1. 0]. He then plays 
a Naming Game traversing the lexical side of of the square to find the word pokuneso for 
this chosen meaning. The hearer traverses the lexical side of the triangle in the other di- 
rection to interpret the word pokuneso as [gray 0.5-1. 0], and then identifies the referent 
by filtering the objects in the context with this meaning. Only rectangle-1 remains, so 
the game succeeds. The whole game is reported by the commentator as follows: 

Game 500 

a2 is the speaker, al is the hearer. 
a2 segments the context into 3 objects: 
rectangle-0 rectangle-1 rectangle-2 
a2 chooses rectangle-1 as the topic 
a2 categorises the topic as [GRAY 0. 5-1.0] 
a2 says: pokuneso 

al interprets pokuneso as [GRAY 0. 5-1.0] 
al points to rectangle-1 
a2 signals OK 

The game is perfectly successful because both agents associate the word okuneso with 
[gray 0.5-1. 0] (dark) and they perceive the scene in the same way. 
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Figure 6.2: Example scene used in game 500. 

Before examining the architecture behind these games in more detail, we can already 
see from Figure 6.4 that al and a2 clearly manage to build autonomously a communica- 
tion system and its underlying ontology from scratch by playing the guessing game. The 
communicative success moves up to reach almost 100 % after a mere 500 games. Given 
that the environment keeps generating novel situations, there is always a chance that a 
scene occurs which requires new categories. So there is always a chance of failure, but 
it will further trigger expansion of the discrimination trees and of the lexicon. 
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rectangle-1 




Figure 6.3: Semiotic triangle underlying game 500. 




Figure 6.4: Success (left y-axis) and average ontology size (right y-axis) for two agents 
playing 500 guessing games. 
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Figure 6.4 also shows the average number of categories in each agent. There is a steep 
rise in the early phases, when no categories exist, but then the creation of new cate- 
gories levels off as discrimination mostly succeeds. When the environment becomes 
more complex, possibly exercising additional sensory channels, the discrimination trees 
would start to expand again, as we have seen in the previous chapter and then the lexi- 
con would start to expand as well. Obviously the lexicon can only start to develop when 
there is an adequate ontology which explains some of the delay before the communica- 
tivesuccess curve starts to climb. 

Table 6.2 displays the complete lexicon of the two agents after 100 games, together 
with the score for each assocation for al and a2. Only associations where the score is 
above 0.0 for at least one agent are shown. A dash (-) indicates that the agent has not 
stored this association yet. 

Table 6.2: Complete lexicon of al and a2 after 100 games 



Meaning 


Word 


Translation 


al 


a2 


[hpos 0.0-0.5] 


vapola 


left 


- 


0.1 


[hpos 0.5-1.0] 


gonapa 


right 


0.1 


- 


[height 0.0-0. 5] 


suwaxugo 


short 


0.6 


0.8 


[height 0. 5-1.0] 


kusone 


tan 


0.4 


0.5 


[width 0.0-0.5] 


bepupepa 


narrow 


0.1 


0.1 


[width 0.0-0.25] 


kutaki 


very narrow 


- 


0.1 


[width 0.5-1.0] 


zikorika 


wide 


0.0 


0.3 


[gray 0.0-0.5] 


fesasado 


light 


0.5 


0.7 


[gray 0.5-1.0] 


pokuneso 


dark 


0.8 


0.9 


[area 0.5-1.0] 


mafanoda 


large 


0.1 


0.1 



We see that, at this point, the agents have lexicalised only the most general distinc- 
tions, such as ‘dark’ ( pokuneso ) versus Tight’ (fesasado ) or ‘short’ ( suwaxugo ) versus ‘tall’ 
( kusone ). Words for the grayscale and height dimensions have the strongest scores, al- 
though this is purely accidental. When we would start another simulation from scratch, 
we would end up with different words and perhaps other distinctions would be more 
successful. 

Table 6.3 is the complete lexicon after 500 games and Table 6.4 after 1000 games. 

We see that words for basic distinctions have further established themselves. Words 
for ‘short’ ( suwaxugo ) and ‘tall’ (kusone), or Tight’ (fesasado ) and ‘dark’ ( pokuneso ) now 
have scores of 1.0. Words for more refined categories, like ‘very short’ (tawube) or ‘very 
narrow’ ( kutaki ), are beginning to establish themselves. 

Two steps in the evolution of the discrimination trees underlying this lexicon are 
shown in Figure 6.5. There is a progressive refinement of all trees as time goes on, be- 
cause all sensory channels have the same chance of being most salient. But the trees are 
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Table 6.3: Lexicon of al and a2 after 500 games 



Meaning 


Word 


Translation 


al 


a2 


[hpos 0.0-0.5] 


vapola 


left 


0.7 


1.0 


[hpos 0.5-1.0] 


gonapa 


right 


0.6 


0.5 


[vpos 0.0-0.5] 


rixuzime 


up 


0.2 


0.7 


[vpos 0.5-1.0] 


gofugage 


down 


0.6 


1.0 


[height 0.0-0.5] 


suwaxugo 


short 


1.0 


1.0 


[height 0.0-0.25] 


tawube 


very short 


0.4 


0.5 


[height 0.25-0.5] 


narofi 


medium short 


0.1 


0.4 


[height 0.5-1. 0] 


kusone 


tall 


1.0 


1.0 


[height 0.5-0.75] 


wuruzo 


medium tall 


0.3 


0.6 


[height 0.75-1.0] 


bowaluro 


very tall 


0.6 


0.2 



Table 6.4: Lexicon of al and a2 after 1000 games 



[width 0.0-0.5] 


bepupepa 


narrow 


1.0 


1.0 


[width 0.0-0.25] 


kutaki 


very narrow 


0.1 


0.5 


[width 0.25-0.5] 


wukogo 


medium narrow 


0.2 


- 


[width 0.5-1.0] 


zikorika 


wide 


1.0 


1.0 


[width 0.5-0.75] 


mitula 


medium wide 


0.1 


- 


[width 0.75-1.0] 


wupixo 


very wide 


- 


0.2 


[gray 0.0-0.5] 


fesasado 


light 


1.0 


1.0 


[gray 0.0-0.25] 


sanize 


very light 


- 


0.1 


[gray 0.5-1.0] 


pokuneso 


dark 


0.9 


1.0 


[gray 0.5-0.75] 


wavosoru 


medium dark 


0.2 


0.5 


[gray 0.75-1.0] 


kuragoni 


very dark 


0.3 


0.2 


[area 0.0-0.5] 


babifewa 


small 


- 


0.1 


[area 0.25-0.5] 


togule 


medium small 


0.1 


0.1 


[area 0.5-1.0] 


mafanoda 


large 


0.2 


0.5 
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not the same for the two agents at every stage of development because even though they 
prefer to expand the salient channel, the agents have encountered different environmen- 
tal situations in which different channels were salient. For example, after 100 games, al 
has less refinements for the width channel than a2. After 500 games, all trees have at 
least one level of refinement. Not all categories have been lexicalised. For example, the 
width channel is three levels deep in both agents but no words exist yet for the deepest 
level. 







Figure 6.5: Evolution of the discrimination trees of al (left) and a2 (right). Snapshots 
have been taken after 100 games (top) and 500 games (bottom). 

Note that this simulation is very different from the ones shown in the previous chap- 
ter. The agents now get only feedback through overt selection of the referent. The hearer 
points to the identified referent and the speaker decides on the outcome of the game 
based on this non-verbal information, but speaker and hearer do not know whether they 
have used the same meaning or not. Very often there are alternative ways to conceptu- 
alise reality, so even if agents would have completely shared ontologies, there is still the 
possibility of guessing the wrong meaning. I have called this the gavagai-problem, in- 
spired by the philosopher Quine, who tells the story of the anthropologist puzzled by the 
word gavagai uttered by a native in an undecoded language. Does gavagai mean rabit, 
animal scurrying by, the direction in which I will go now, or white furry object? The 
child who is acquiring a lexicon has exactly the same problem. It explains why overex- 
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tensions or underextensions are seen in a child’s first words. For example, the word for 
orange is applied to any small circular round object, including a ball, or a doorknob. 

6.1.2 Input-output coupling 

Obviously the first thing I had to do to get these results, is make the inputs of one layer 
the outputs of the other (Figure 6.6). When the speaker has conceptualised the scene, the 
possible solutions enter the lexical layer for lexicon lookup. The resulting words get into 
a competition and the one with the highest score wins. In a more complete system with 
a syntactic layer, different lexicalisations would be considered by the syntactic layer to 
find the one that fits with the rest of the grammatical structure. 




Figure 6.6: Flow of solutions through coupled layers from perception to conceptualisa- 
tion and lexicalisation with re-entry links between them. 

The importance of having re-entry links now becomes clear. The choice which con- 
ceptualisation is finally chosen as the best one will depend on the lexical layer because 
the speaker should prefer those categories whose lexicalisation is best established, if he 
wants to maximise success in the game. Due to the constant evolution of the lexicon 
and the presence of synonyms, the conceptual layer cannot know once and for all what 
the most appropriate conceptualisation from the viewpoint of language will be. And it 
needs to know the outcome of the lexical layer to later update the scores of participating 
categorisers. 

The hearer uses the same layers but now with solutions flowing in the other direc- 
tion. He gets a set of words which generate possible conceptualisations through lexicon 
lookup and these then are applied to the scene to find the referent (Figure 6.7). Because 
layers have this dual mode of operations, it is perfectly possible that the hearer has 
already guessed words and thus strong expectations based on the scene and his own 
conceptualisation of it, although this has not been implemented in the Talking Heads 
yet. 
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re-entry 



re-entry 









0 

1 



Conceptual layer 



? 



outputs 




outputs = 
inputs 




inputs 



Figure 6.7: Layers operate in two directions. The flow of solutions in the hearer is shown 
from words to categories and perceptions and in the other direction through 
re-entry. 



Also for the hearer, the re-entrant flow is important. A hearer can only know which 
meaning was intended for a particular word after trying out the meaning on the scene. 
He therefore uses the context to determine the meaning of the utterance. For example, 
a particular word may mean both [large] and [dark], particularly during the phase of 
early language acquisition. However if only one of these categories picks out a single 
referent, it is chosen as the meaning, and the hearer will act upon this choice by pointing 
to the object it singles out from the scene. 

This architecture takes care of synonymy and ambiguity and makes sure that the most 
plausible form/meaning/referent chain stands out. A similar architecture may explain 
how humans effortlessly pick out the appropriate meanings from the many possible 
meanings a word typically has and not even be aware of the alternatives. If our language 
sytems could not cope this way with ambiguity and ambiguity we would have had to 
use a lexicon where every word can have only a single meaning. Language has had to 
recruit whatever capacity was already available. 

6.1.3 Updating the scores 

The next thing I had to do is reconsider the score updating mechanisms, even though they 
are basically the same as used earlier (Figure 6.8). For a given referent, there are multiple 
meanings possible (in case more than one channel is considered to be sufficiently salient), 
and for each meaning there are multiple words. The best one of this whole lot is chosen 
by the speaker and used for the utterance transmitted to the hearer. We have seen that it 
is important for the hearer to use lateral inhibition based on the outcome of a game. But 
although the speaker is considering all the possible conceptualisations, lateral inhibition 
should only take place between the lexicalisations of the meaning that was finally chosen. 
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So when the game was successful, the speaker increases the association that was used 
with S and decreases all the other associations with the same meaning, delta is still set to 
a reasonable low value, namely delta = 0.1. The hearer does the same as explained earlier 
in the Naming Game. The score of all the alternative meanings for the word (or words) 
that are used by the speaker are decreased. When the game fails, both the speaker and 
the hearer decrease the association that they used with S. No change takes place to the 
scores of any of the other associations. 




Figure 6.8: Score adjustments after a successful game. Used associations go up and com- 
peting associations go down. The game producing these relationships is dis- 
cussed later as game 10008. 

The scores of the categories and category combinations in the discrimination trees 
should also be updated. When a category or category combination is used as part of the 
communication in the game (as meaning for the speaker or the hearer), its use counter 
goes up. When the game is successful, its success counter goes up. To know the score of a 
particular category or category-set, the agent simply divides success by use. Because the 
score of the categories thus depends on their success in the language game, a strong co- 
ordination gradually arises between conceptualisation and lexicalisations. After a while, 
categorisations will be preferred that are amenable to yield successful language games 
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and of course the language only lexicalises categories that are nodes in discrimination 
trees. We thus get a progressive coordination of both the repertoire of categories and 
the lexicon, as I will discuss in more detail later. 

The other criteria discussed earlier (simplicity of the categories and level of depth in 
the tree) are still used for ranking the possible conceptualisations coming out of the 
lexicon, particularly when none of the meanings has been lexicalised yet or whether 
there are multiple possibilities. The human brain is clearly capable to integrate many 
more criteria in lexical choice. For example, when talking to a child we might use more 
common words than we would use when talking to another adult. 

6.1.4 Repair processes 

Of course the other repair processes discussed earlier are still going on as well. Agents 
expand their discrimination trees when they fail to categorise and they invent new words 
or adopt words from the other if necessary. The task is more complicated compared to 
the simple Naming Game because the hearer now gets no direct feedback of the meaning 
only of the referent. In case of failure, the hearer must try to find himself a distinctive 
category or category set discriminating the referent from the other objects in the context. 

Here is an example game illustrating this type of repair process. The data for game 77 
(after sensor-scaling) is shown in Table 6.5. 

Table 6.5: Sensory data for game 77 after sensor-scaling. 



Obj 


Hpos 


Vpos 


Height 


Width 


Gray 


Area 


0 


0.51 


0.98 


0.90 


0.47 


0.79 


0.60 


1 


0.75 


0.68 


0.26 


0.09 


0.14 


0.20 


2 


0.76 


0.54 


0.56 


0.26 


0.94 


0.36 



The speaker is again a2. height is the most salient channel. a2 has a word for [height 
0.75-1.0] (very tall), namely bowaluro, and uses it. The hearer al does not know the word, 
conceptualises the scene, and arrives at the same category, so the hearer stores the new 
word with the same meaning as the speaker. 

Game 7 7 

a2 is the speaker, al is the hearer. 
a2 segments the context into 3 objects: 
rectangle-0 rectangle-1 rectangle-2 
a2 chooses rectangle-1 as the topic 
a2 categorises the topic as [HEIGHT 0.75-1.0] 
a2 says: 'bowaluro' 

al does not know 'bowaluro' 
al says: 'bowaluro?' 
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a2 points to rectangle-1 

al categorises the topic as [HEIGHT 0.75-1.0] 
al stores 'bowaluro' as [HEIGHT 0.75-1.0] 

The reason why al has guessed the right meaning of bowaluro is because both agents 
use only the most salient channel and they both share the same perception of reality. We 
will soon see that if these constraints are not valid, agents are not always so lucky and 
hence multiple meanings start to circulate for the same word. 

A similar repair action takes place when the hearer cannot guess a unique referent, as 
illustrated in game 96 drawn from the same simulation series. The scene contains three 
rectangles and the topic is the most narrow rectangle. The segments in the scene of game 
279 have the characteristics (after sensor-scaling) shown in Table 6.6. 

Table 6.6: Sensory data for game 279 



Obj 


Hpos 


Vpos 


Height 


Width 


Gray 


Area 


0 


0.51 


0.71 


0.64 


0.61 


0.80 


0.56 


1 


0.51 


0.91 


0.53 


0.41 


0.42 


0.41 



The hearer guessed the meaning used by the speaker right away, because both share 
the same perception and both use the most salient channel as basis for categorisation. 

Game 96 

al is the speaker. a2 is the hearer, 
al segments the context into 3 objects: 
rectangle-0 rectangle-1 rectangle-2 
al chooses rectangle-1 as the topic 
al categorises the topic as [WIDTH 0.0-0.25] 
al creates a new word: 'kutaki' 

al says: 'kutaki' 

a2 does not find a unique referent 
a2 says: 'kutaki?' 

al points to rectangle-1 

a2 categorises the topic as [WIDTH 0.0-0.25] 
a2 stores 'kutaki' as [WIDTH 0.0-0.25] 



6.2 Synonymy 

When we scale up the population, synonyms (several words for the same meaning) will 
start to appear. Indeed, when there are only two agents, the hearer picks up a word as 
soon as the speaker has created it. With a larger group, it is much more likely that agents 
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create words not knowing that words already exist in the population, and it takes time 
for a new word to propagate. 

Synonyms are not a positive feature of a language. They make it less efficient for 
the speaker to find the most appropriate word to express a meaning, they require more 
memory to store the lexicon, and confuse a new virgin agent coming into the group. In 
natural languages, synonyms get damped and we have seen in the previous chapter that 
the positive feedback loop between use and success (implemented by lateral inhibition 
in the agent’s score updating process) has the same effect. Let us now see whether this 
is still the case if the hearer does not get any feedback about the meaning used. 

The following simulation uses a group of ten agents. Each agent still only uses the 
most salient channel so that the agents can guess the meaning easily in case a word is 
not known. The evolution of communicative success for a series of 4000 games, which 
means about 800 games per agent, is shown in Figure 6.9. An effective lexicon and 
ontology is emerging because we see communicative success rise. 




Figure 6.9: Communicative success (left y-axis) and average ontology size (right y-axis) 
is shown for a series of 4000 language games played by 10 agents. 

Part of the lexicons of five of the ten agents after 2000 games (only those words which 
have positive scores) are shown in the Table 6.7. We see that synonymy does indeed oc- 
cur. For example, two words are in the running for [hpos 0.5— 1.0] : rutaxese and xomupovi, 
and three for [vpos 0. 5-1.0]: wavone, zaxawe, and dazofo. Some words are already well 
established, for example numefuli for [gray 0. 5-1.0] (dark). 
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Table 6.7: Lexicon of five agents after 2000 games. 



Meaning 


Word 


al 


a2 


a3 


a4 


a5 


[hpos 0.5-1.0] 


rutaxese 




0.1 










xomupovi 


0.1 




0.2 




0.1 


[vpos 0.5-1.0] 


wavone 






0.3 








zaxawe 








0.1 






dazofo 


0.1 






0.2 




[width 0.0-0.5] 


buxevo 

vubupo 


1.0 


0.6 


1.0 


0.1 


1.0 


[width 0.5-1.0] 


pawixona 






0.1 








rikepule 


1.0 


0.8 


1.0 


1.0 


1.0 


[width 0.5-0.75] 


gowinoge 

wesurodi 






0.1 


0.2 




[width 0.75-1.0] 


besabi 

lituvi 


0.1 




0.2 






[gray 0.5-0.75] 


goxomixe 

korufo 




0.1 




0.2 




[gray 0.0-0.5] 


numefuli 


1.0 


0.7 


1.0 


1.0 


1.0 


[gray 0.0-0.25] 


rekemaxi 


0.2 










[gray 0.5-1.0] 


faluleru 


0.6 


0.6 


1.0 


1.0 


1.0 




nupanu 










0.2 




Figure 6.10: In the case of synonymy, there are multiple words for the same meaning and 
hence the same referent. 
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The positive feedback loop between use and success has already dampened some syn- 
onyms. For some cases, like [gray 0.5-1. 0] the competition has died out with one word 
faluleru being the winner. For others, like [hpos 0.5-1. 0], the competition is still going 
on, although we can guess that xomupovi is probably going to be the winner. 

It is instructive to follow the history of the words in use for a particular meaning. For 
example, let us look at the words for [gray 0.5-1. 0] (dark) in the very early phases of 
lexicon development. Three different words are quickly created, and faluleru is the first 
one that has some success. 

Game 7 6 

Speaker a4 creates 'notabefe' for [GRAY 0. 5-1.0] 

Hearer a5 adopts 'notabefe' for [GRAY 0.5-1. 0] 

Game 7 9 

Speaker a2 creates ' vivevobo ' for [GRAY 0. 5-1.0] 

Hearer a7 does not adopt ' vivebo ' 

(failed to discriminate) 

Game 8 8 

Speaker a7 creates 'faluleru' for [GRAY 0. 5-1.0] 

Hearer a4 adopts 'faluleru' for [GRAY 0.5-1. 0] 

Game 100 

Speaker a7 uses 'faluleru' for [GRAY 0. 5-1.0] 

Hearer a4 correctly interprets 'faluleru' 

The semiotic triangles existing at this point in the population are summarised in Fig- 
ure 6.10. Agent a4 is now already in a dilemma because he has created notabefe and 
picked up faluleru from a7. 

With a lower word creation rate ( w c ), fewer new words would be created. The initial 
bootstrapping of the lexicon would then take a bit longer but the process of weeding 
out synonymy would be shorter. With a lower word adoption rate ( w a ), agents are less 
inclined to adopt a word and that again diminishes the chance that new words spread, 
if words already exist for the same meaning. But even with high word creation and 
word adoption rates, the whole system stabilises automatically. The stronger a lexicon 
is already in place, the fewer new synonyms arise because new words created by virgin 
agents entering the population have hardly any chance to propagate. 

Note that the word creation rate can never be completely zero because then the agents 
would no longer be able to handle new meanings. The word adoption rate can never 
be equal to zero either because then new words cannot spread in the population and 
there would be a high chance that the lexicon does not become coherent with subgroups 
getting stuck with different words for the same meaning. 

Because there are synonyms, agents must now choose which word to use. The one 
with the highest score should clearly be preferred because based on the evidence the 
agent has gathered so far, this gives the highest chance of success in the game. a4 is 
faced with this kind of choice in the next game in the series involving the meaning [gray 
0.5-1. 0]. a4 chooses faluleru because this word has the highest score. It is immediately 
picked up by al: 
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Game 101 

a4 is the speaker, al is the hearer. 
a4 segments the context into 2 objects: 
rectangle-0 rectangle-1 
a4 chooses rectangle-1 as the topic 
a4 categorises the topic as [GRAY 0.5-1. 0] 
a4 has two words for [GRAY 0. 5-1.0] : 

' f aluleru ' (0.20) 

'notabefe' (0.00) 
a4 says: 'faluleru' 

al does not know 'faluleru' 
al says: 'faluleru?' 

a4 points to rectangle-1 

al categorises the topic as [GRAY 0.5-1. 0] 
al stores 'faluleru' as [GRAY 0.5-1. 0] 

After this game, three agents “know” the word faluleru for [gray 0.5-1. 0]: a4, al, and a7. 
When we continue to inspect the simulation we see that notabefe takes a bit of a revenge. 
The next games with the meaning [gray 0.5-1. 0] all involve the word notabefe: 

Game 111 

Speaker a5 uses 'notabefe' for [GRAY 0.5-1. 0] 

Hearer a4 correctly interprets 'notabefe' 

Game 113 

Speaker a4 uses 'notabefe' for [GRAY 0.5-1. 0] 

Hearer a9 adopts 'notabefe' for [GRAY 0. 5-1.0] 

Game 127 

Speaker a9 uses 'notabefe' for [GRAY 0.5-1. 0] 

Hearer a6 adopts 'notabefe' for [GRAY 0. 5-1.0] 

But then there is another occurrence of faluleru: 

Game 135 

Speaker al uses 'faluleru' for [GRAY 0.5-1. 0] 

Hearer a2 adopts 'faluleru' for [GRAY 0. 5-1.0] 

And now a6, which has not been involved yet in any interaction concerning the meaning 
[gray 0.5-1. 0], further confuses the situation by creating a new word, sopine, which a5, 
playing the role of hearer, adopts. 

Game 140 

Speaker a6 creates 'sopine' for [GRAY 0. 5-1.0] 

Hearer a5 adopts 'sopine' for [GRAY 0. 5-1.0] 

This kind of evolution continues with a struggle between notabefe and faluleru. The 
agents know both words so the games do not fail. But faluleru, just by chance, starts 



148 




6.3 Ambiguity 



to occur a bit more often which causes its score to go up a bit more. This results in 
faluleru being used even more, and, due to lateral inhibition, notabefe used less. Gradu- 
ally fa luleru dominates for the whole population. After 2000 games, the competitors to 
faluleru have all disappeared from the group lexicon. 



6.3 Ambiguity 

We now continue in small steps to scale up the challenge to the agents by progressively 
adding more realism to the simulation. So far I assumed that agents use only the most 
salient channel for categorising the topic and that their perception of the world is iden- 
tical. Consequently a hearer can guess with 100 % success the meaning of an unknown 
word and it is therefore no wonder that the agents arrive at a shared communication 
system, even though neither the lexicon nor the repertoire of categories has been sup- 
plied in advance by a designer nor their development centrally coordinated. The main 
problem for them so far is the rise of synonyms which need to be damped to increase 
the probability of successful and efficient communication. 

I now relax the saliency assumption. When there is a lower saliency threshold, more 
than one sensory channel is considered by the conceptual layer, possibly leading to sev- 
eral alternative conceptualisations of the scene. The question is then whether the agents 
are still able to reach a shared communication system despite the unavoidable word am- 
biguities that this generates. 

We begin again by looking at simulations with two agents so that we can clearly 
see the impact of multiple conceptualisations. The architecture of the agents has not 
changed, I only lowered the saliency threshold. The scenes have become a bit more 
complex as well. They now not only contain rectangles but also squares, circles, and 
triangles. This has a limited impact because the same sensory channels are used as before 
and none of them is really sensitive to shape properties. 




Figure 6.11: Scene used in game 3. 
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6.3.1 How words may still get the same meaning 

When inspecting the simulation results, we see first of all that we may still accidentally 
get the same situation as before, i.e. one where the hearer selects the same channel as 
the speaker for conceptualising the scene, even though several channels are salient. This 
happens in the following game which involves a scene with a rectangle and a square 
(Figure 6.11). 

The data for game 3, after sensor-scaling, are shown in Table 6.8. 

Table 6.8: Sensory data for game 3 after scaling. 



Obj 


Hpos 


Vpos 


Height 


Width 


Gray 


Area 


0 


0.37 


0.44 


0.92 


0.92 


0.69 


0.10 


1 


0.98 


0.55 


0.25 


0.11 


0.78 


0.88 



Values for vpos and gray are very close so they are not considered as sufficiently 
salient. 

Game 3 

al is the speaker. a2 is the hearer, 
al segments the context into 2 objects: 
square-0 rectangle-1 
al chooses rectangle-1 as the topic 
al considers as salient AREA WIDTH HEIGHT HPOS 
al categorises the topic as [HEIGHT 0. 0-0.5] 
a2 creates a new word: \smplenquote{mibati } 
al says: 'mibati' 

a2 does not know 'mibati' 
a2 says: 'mibati?' 

al points to rectangle-1 

al considers as salient AREA WIDTH HEIGHT HPOS 
al categorises the topic as [HEIGHT 0. 0-0.5] 
al stores 'mibati' as [HEIGHT 0. 0-0.5] 

[height 0.0- 0.5] has been chosen by the speaker because height was one of the salient 
channels (even though clearly not the only salient one) and because a successful distinc- 
tion already existed in the ontology. The same distinction was chosen by chance by the 
hearer, but he could just as well have chosen a distinction based on the area, width or 
HPOS. 

In the next game of the simulation series, mibati is used again, now by a2 as speaker. 
It is based on a scene with a triangle and a rectangle. The data for game 4 are shown in 
Table 6.9 (after scaling). 
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Table 6.9: Lexicon of al and a2 after 500 games. 



obj 


HPOS 


VPOS 


HEIGHT 


WIDTH 


GRAY 


AREA 


0 


0.47 


0.23 


0.90 


0.83 


0.49 


0.34 


1 


0.21 


0.22 


0.67 


0.79 


0.86 


0.63 



The rectangle is chosen as topic. Two alternative categories can be used by a2 (and 
are listed by the commentator) but the one preferred is the one with the strongest lexi- 
calisation, which is mibati. 

Game 4 

a2 is the speaker, al is the hearer. 
a2 segments the context into 2 objects: 
triangle-0 rectangle-1 
a2 chooses rectangle-1 as the topic 
a2 considers as salient AREA GRAY HEIGHT HPOS 
a2 categorises the topic as [HEIGHT 0.0-0. 5] 

{ } [GRAY 0.5-1. 0] 
a2 has the word 

mibati for [HEIGHT 0.0-0. 5] (1.0) 

a2 says: 'mibati' 

al interprets 'mibati' as [HEIGHT 0.0-0. 5] 
al points to rectangle-1 
a2 signals OK 

This example illustrates two points. First of all the language interpretation process in- 
fluences which conceptualisation of the scene is preferred. For the speaker, both [height 
0.0-0.5] (short) and [gray 0. 5-1.0] (dark) are possible ways to distinguish the topic from 
the other objects in the context. But [height 0.0-0.5] is chosen because its lexicalisa- 
tion has a higher score. Second, we begin to see why agents will manage to coordinate 
their ontologies, even though they do not have any direct feedback about each other’s 
internal structures. The score of [height 0.0-0. 5] goes up after this game and if the 
agent has to choose a category later purely based on the score of the categories them- 
selves, [height 0.0-0.5] will be the one being preferred. Unless [gray 0.5-1.0] manages 
to become successfully lexicalised itself, it even risks to get pruned away. 

6.3.2 How words get different meanings 

Here is an example where ambiguity slips in the lexicon. The game involves two objects: 
a rectangle and a square (Figure 6.12). The data for game 9, after sensor-scaling, are 
shown in Table 6.10. 

The discrimination trees of the two agents at this point are shown in Figure 6.13. The 
speaker uses a distinction based on the vpos channel namely [vpos 0.5-1.0] (top), cre- 
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□ 



0 



Figure 6.12: Scene used in game 9. Square-1 is the topic. Several conceptualisations are 
possible so the agents get divergent meanings. 





Figure 6.13: Discrimination trees of speaker al (left) and hearer a2 (right) available in 
game 9 



Table 6.10: Data for game 9 after sensor-scaling. 



obj 


HPOS 


VPOS 


HEIGHT 


WIDTH 


GRAY 


AREA 


0 


0.92 


0.38 


0.59 


0.83 


0.36 


0.61 


1 


0.32 


0.88 


0.54 


0.54 


0.12 


0.42 
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ates a word for it puxazi, and transmits this to the hearer. The hearer does not know 
the word, conceptualises the scene based on this non-verbal hint from the speaker, and 
identifies the category [gray 0.0- 0.5] (light) as distinctive. So this meaning is stored and 
it is different from the one used by speaker. The current constellation of meanings is 
summarised in Figure 6.14. 



square-1 




Figure 6.14: Semiotic triangles underlying game 9. For the same referent and the same 
word, there are two different meanings. Dashed lines indicate the relations 
used by the speaker a2. Straight lines indicate those used by the hearer al. 

Game 9 

al is the speaker. a2 is the hearer, 
al segments the context into 2 objects: 
rectangle-0 square-1 
al chooses square-1 as the topic 
al considers as salient GRAY WIDTH VPOS HPOS 
al categorises the topic as [VPOS 0. 5-1.0] 
a2 creates a new word: 'puxazi' 

al says: 'puxazi' 

a2 does not know 'puxazi' 
a2 says: 'puxazi?' 

al points to square-1 

a2 considers as salient GRAY WIDTH VPOS HPOS 
a2 categorises the topic as [GRAY 0.0-0. 5] 
a2 stores 'puxazi' as [GRAY 0.0-0. 5] 

A subsequent game (game 11) illustrates that despite semantic incoherence, a game 
can still succeed. The agents do not know that each of them means something else by 
puxazi and if the meanings are compatible, they have no reason to change their internal 
lexicon: 
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Game 1 1 

a2 is the speaker, al is the hearer. 
a2 segments the context into 3 objects: 
circle-0 rectangle-1 
a2 chooses rectangle-1 as the topic 
a2 considers as salient AREA GRAY WIDTH HPOS 
a2 categorises the topic as [GRAY 0. 0-0.5] 
a2 says: 'puxazi' 

al interprets 'puxazi' as [VPOS 0.5-1. 0] 
al points to rectangle-1 
a2 signals OK 

triangle-1 rectangle-2 




Figure 6.15: Semiotic triangles used in game 14. Dashed lines are the relations used by the 
hearer a2. Straight lines those of the speaker al. After this game, al adopts 
yet another meaning for puxazi, namely [height 0.5-1. 0]. 

Here is next a game (game 14) where a2 comes to adopt the other meaning of puxazi, 
because the first meaning does not work in the present context. The game involves three 
objects: two rectangles and a triangle (Figure 6.16). The data for game 14, before context 
scaling, are shown in Table 6.11. 

Table 6.11: Sensory data for game 14 after scaling. 



obj 


HPOS 


VPOS 


HEIGHT 


WIDTH 


GRAY 


AREA 


0 


0.60 


0.49 


0.12 


0.09 


0.78 


0.06 


1 


0.95 


0.71 


0.51 


0.60 


0.07 


0.15 


2 


0.29 


0.31 


0.29 


0.65 


0.03 


0.33 
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al conceptualises the scene using [vpos 0.5-1. 0], which he has lexicalised as puxazi. 
For a2, puxazi means [gray 0.0-0.5] (light), but this meaning identifies both rectangle-2 
and triangle-1 (Figure 6.15). The game therefore fails and the speaker points to the topic. 
The hearer conceptualises the scene based on this non-verbal hint from the speaker using 
the height dimension and adopts this as the second meaning of puzaxi. 

Game 1 4 

al is the speaker. a2 is the hearer, 
al segments the context into 3 objects: 
rectangle-0 triangle-1 rectangle-2 
al chooses triangle-1 as the topic 
al considers as salient HEIGHT VPOS HPOS 
al categorises the topic as 
{ } [VPOS 0.5-1. 0] [HEIGHT 0. 5-1.0] 
al says: 'puxazi' 

a2 interprets 'puxazi' as [GRAY 0.0-0. 5] 

a2 identifies rectangle-2 triangle-1 

a2 says: 'puxazi?' 

al points to triangle-1 

a2 considers as salient HEIGHT VPOS HPOS 
a2 categorises the topic as [HEIGHT 0. 5-1.0] 
a2 stores 'puxazi' as [HEIGHT 0. 5-1.0] 

Note that the hearer could in principle also have categorised the scene using vpos and 
hpos, because they are equally salient. So there was absolutely no guarantee that puxazi 
would have been associated with [height 0. 5-1.0] by a2. Moreover a2 stores an associ- 
ation between puxazi and [height 0.5-1. 0], even though there is already another word 
for [height 0.5-1. 0] in his lexicon. The fact that several meanings are possible to distin- 
guish the topic in a given context has not only the consequence that ambiguity arises 
but also that synonyms may enter into the lexicon, even with only two agents! 

6.3.3 Competition between word meanings 

Once a word has more than one meaning (ambiguity), and once several words exist 
for the same meaning (synonymy), a struggle between word-meaning pairs sharing the 
same word or the same meaning develops. The lateral inhibition carried out by the 
speaker in case of a successful game pushes down alternative lexicalisations for the same 
meaning, thus damping synonyms, and the lateral inhibition carried out by the hearer 
pushes down alternative meanings for the same word, thus damping ambiguity. 

Game 25 illustrates this effect of lateral inhibition. The situation before the game is 
as depicted in Figure 6.17. Two words (with different meanings) are competing in a2 to 
identify the referent: puxazi (meaning [gray 0.0-0.5], i.e. light) and torigusu (meaning 
[vpos 0.0-0.5], i.e. lower), puxazi has a higher score and wins the competition. Because 
the game was successful, the score of puxazi goes up. The score of torigusu does not 
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Figure 6.16: Scene used in game 14. Synonymy arises because the hearer conceptualises 
the scene differently from the speaker. 

rectangle- 1 




Figure 6.17: Semiotic triangles used in game 25. The score of the relation between [gray 
0.0-0. 5] and puxazi is increased in both agents. The relation between puxazi 
and [gray 0.0-0.5] gets damped. 



change because it concerns another meaning. Two meanings for puxazi are competing 
in al: [gray 0.0-0. 5] (light) and [vpos 0.0-0. 5] (down), [vpos 0.0-0. 5] gets damped and 
the score of [gray 0.0 -0.5] goes up, thus helping to further disambiguate puxazi. 

Game 2 5 

a2 is the speaker, al is the hearer. 
a2 segments the context into 2 objects: 
rectangle-0 rectangle-1 
a2 chooses rectangle-1 as the topic 
a2 considers as salient GRAY VPOS 
a2 categorises the topic as [VPOS 0.0-0. 5] 

{ } [GRAY 0.0-0. 5] [GRAY 0.0-0.25] 
a2 has the words 
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puxazi for [GRAY 0.0-0. 5] (0.20) 

torigusu for [VPOS 0.0-0. 5] (0.0) 

a2 says: 'puxazi' 

al interprets 'puxazi' as 
{ } [GRAY 0. 0-0.5] (0.20) 

{ } [VPOS 0. 0-0.5] (0.0) 

al points to rectangle-1 
a2 signals OK 

After this game, the association between [gray 0.0-0.5] and puxazi will have a score of 
0.3 in both agents. The association between puxazi and [vpos 0.0-0.5] in the hearer is 
decreased, but because it was already 0.0, it cannot decrease further. 

Note that the speaker not only conceptualises the scene using the most generic distinc- 
tions ([gray 0.0-0.5], [vpos 0.0-0.5]) but also with more specific ones ([gray 0.0-0.25]). 
Indeed it may happen that there is no word for a more generic distinction, but there 
is one for a more specific one, in which case it should be used. So, all discriminative 
distinctions, whatever their level of detail, are transmitted from the categorisation layer 
to the lexical layer and it is up to the lexical layer to choose. Of course, everything else 
being equal, other criteria are still important. If there is a more abstract category (mean- 
ing one higher in the discrimination tree) it is preferred over a more specific one if they 
have equal lexical scores. 

6.3.4 Lexical and ontological development 

Despite additional complication of mutually compatible meanings for unknown words, 
the two agents nevertheless manage to build up a shared communication system, as can 
be seen from Figure 6.18. Each time the ontology is extended, communicative success 
dips because a new word needs to be acquired, but the agents clearly manage to become 
successful in the guessing game. 

Success does not mean that the lexicons are completely identical. As we have seen in 
game 11, it is possible to have communicative success with different meanings for the 
same word as long as the different meanings pick out the same referent. Of course in 
the domain of the geom world, it is a pure coincidence that two categories pick out the 
same referent and therefore alternative meanings for the same word will get damped. 
However, if there are more regularities, ambiguity persists much longer. In fact, ambi- 
guity may persist in natural languages if the different meanings of a word are so closely 
related that it is sufficiently often unclear which meaning is intended. 

Here are some snapshots of the developing lexicon. After 30 games, the lexicon of the 
established words (i.e. associations with a score greater than 0.0 for at least one agent) 
is as in Table 6.12. 

Note that there are two meanings for puxazi: [gray 0.0-0.5] (light) and [height 0.5- 
1.0] (tall). 

The lexicon after 200 games is shown in Table 6.13. 

The second meaning of puxazi has now disappeared from the lexicon. So we basically 
see the same situation as before when only one most salient channel was considered by 
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Figure 6.18: Communicative success (left y-axis) and average ontology size (right y-axis) 
for a series of 2000 language games played by two agents. 



the agents. Words for more general distinctions happen to be lexicalised first because 
they are more often useful in the game, but when needed words for more specific mean- 
ings start to develop. 



Table 6.12: Group lexicon after 30 games. 



Meaning 


Word 


Translation 


al 


a2 


[vpos 0.0-0.5] 


torigusu 


left 


0.4 


0.3 


[height 0.0-0.5] 


mibati 


short 


0.6 


0.6 


[height 0.5-1.0] 


puxazi 


tall 


0.2 


0.0 


[gray 0.0-0.5] 


puxazi 


light 


0.6 


0.7 


[gray 0.25-0.5] 


turawa 


medium light 


0.1 


0.0 


[gray 0.5-1.0] 


xubevilo 


dark 


0.0 


0.1 
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Table 6.13: Group lexicon after 200 games. 



Meaning 


Word 


Translation 


al 


a2 


[hpos 0.0-0.5] 


lefividi 


left 


0.2 


0.2 


[hpos 0.5-1.0] 


vuvovo 


right 


0.2 


0.4 


[vpos 0.0-0.5] 


torigusu 


left 


0.8 


0.2 


[vpos 0.5-1.0] 


rugomoto 


right 


1.0 


1.0 


[height 0.0-0.5] 


mibati 


short 


1.0 


0.9 


[gray 0.0-0.5] 


puxazi 


light 


0.3 


0.6 


[gray 0.25-0.5] 


turawa 


medium light 


0.4 


0.4 


[gray 0.5-1.0] 


xubevilo 


dark 


0.4 


0.5 


[gray 0.5-0.75] 


visuxa 


very dark 


0.3 


0.3 



6.4 Scaling up 

Conforted by having reached a new plateau in the challenges confronting the agents, I 
now go one more step further. First we scale up the population to see whether despite 
the ambiguity now persistently present, a larger group of agents still manages to develop 
a shared communication system. 

6.4.1 Increasing the population size 

We already know from the previous section that a larger population automatically in- 
creases the risk for synonymy. Figure 6.19 shows the evolution of communicative suc- 
cess and average ontology size in a population of ten agents. We see again a steady 
progression towards an effective communication system. Recall that these games are 
played with randomly generated scenes from the geom world. 

It is instructive to examine in detail the lexicon that has emerged after this series, 
where every agent has on average played about 2000 games. The table shown in Ta- 
ble 6.14 shows word-meaning pairs whose frequency is larger than 0.8. 

We see that for all the sensory channels, solid words exist for the top level categories. 
There is however one exception: there are no words in this group for area nor for 
[height 0.5-1. 0]. We do find these words, and words for more refined notions as well, 
in the batch of word-meaning pairs whose scores are between 0.5 and 0.8, shown in 
Table 6.15. 

Although most of these words are on their way towards total coherence, because the 
competition has already been damped completely, this is less the case for area/height 
words. Closer inspection reveals that there are two words competing for expressing 
area and height: texiraxi and wixizode (Figure 6.20). Both words have both meanings 
but there is a strong divergence of opinion in the population. Some prefer the area 
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Figure 6.19: Communicative success (left y-axis) and average ontology size (right y-axis) 
for a series of 20,000 language games played by ten agents. 



Table 6.14: Group lexicon after 2000 games. 



Word 


Meaning 


Translation 


Frequency 


larubo 


[hpos 0.0-0.5] 


left 


1.00 


tituroxu 


[hpos 0.5-1.0] 


right 


1.00 


fumetese 


[vpos 0.0-0.5] 


top 


1.00 


tokadapa 


[vpos 0.5-1.0] 


bottom 


1.00 


povomovi 


[width 0.0-0.5] 


thin 


1.00 


kilokawe 


[width 0.5-1.0] 


wide 


1.00 


legoka 


[height 0.0-0.5] 


short 


0.94 


vuwusugu 


[GREY 0.0-0.5] 


light 


1.00 


kewenoku 


[GREY 0.5-1. 0] 


dark 


1.00 
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Table 6.15: Table of word meaning pairs and their average scores. 



Word 


Meaning 


Translation 


Frequency 


nifavipa 


(hpos 0.0-0.25) 


very left 


0.55 


nodanova 


(vpos 0.0-0.25) 


very top 


0.58 


texiraxi 


(height 0.5-1.0) 


tall 


0.69 


poxalu 


(height 0.0-0.25) 


very short 


0.70 


fovibilo 


(height 0.25-0.5) 


medium short 


0.61 


wixizode 


(area 0.5-1. 0) 


large 


0.78 


tebuwona 


(GREY 0.75-1.0) 


very dark 


0.76 


mogevo 


(GREY 0.25-0.5) 


medium light 


0.60 


toduwe 


(GREY 0.0-0.25) 


very light 


0.65 



referent 




Figure 6.20: Two different words have the same two meanings and have difficulty disam- 
biguating because they are often both equally distinctive in a given situation. 



meaning, others prefer the height meaning. This can be seen from the scores of the 
different meanings. 

For the word texiraxi, the scores of the different agents for the area and height cat- 
egories is as in Table 6.16. 

A game where the two meanings are compatible is shown below: 

Game 10008 

a2 is the speaker. a4 is the hearer. 
a2 segments the context into 2 objects: 
circle-0 square-1 
a2 chooses square-1 as the topic 
a2 considers as salient AREA WIDTH HEIGHT 
a2 categorises the topic as 
{ } [HEIGHT 0.5-1. 0] [HEIGHT 0.75-1.0] 
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Table 6.16: Scores for area and height categories. 



Agent 


Scores texiraxi 

HEIGHT AREA 


Scores wixizode 

HEIGHT AREA 


al 


0.00 


0.9 


0.6 


0.4 


a2 


0.0 


1.0 


1.0 


0.0 


a3 


0.90 


0.2 


0.0 


1.0 


a4 


1.00 


0.0 


0.0 


1.0 


a5 


1.00 


0.0 


0.2 


0.6 


a6 


1.00 


0.0 


0.0 


1.0 


a7 


1.00 


0.0 


1.00 


1.0 


a8 


1.00 


0.0 


1.00 


1.0 


a9 


1.00 


0.0 


1.00 


1.0 


alO 


0.0 


1.0 


0.0 


0.8 



{ } [WIDTH 0.5-1. 0] [WIDTH 0.75-1.0] 

{ } [AREA 0. 5-1.0] [AREA 0.75-1.0] 

{ } [HEIGHT 0. 5-1.0] [HEIGHT 0.75-1.0] 
a2 has the words 

wixizode for [HEIGHT 0. 5-1.0] (1.0) 
kilokawe for [WIDTH 0.5-1. 0] (1.0) 

texiraxi for [AREA 0. 5-1.0] (1.0) 

powugeme for [HEIGHT 0. 5-1.0] (0.8) 
wetami for [HEIGHT 0.75-1.0] (0.5) 

wofetizo for [WIDTH 0.5-1. 0] (0.4) 

kufule for [HEIGHT 0.75-1.0] (0.1) 

lisexese for [HEIGHT 0.75-1.0] (0.1) 

a2 says: 'wixizode' 

a7 interprets 'wixizode' as 
{ } [AREA 0. 5-1.0] (1.0) 

{ } [WIDTH 0 . 0-0 .5] (0.0) 

{ } [HEIGHT 0 . 5-1 .0] (0.0) 

a7 identifies square-1 
a2 signals OK 

Note the abundance of choices for the speaker. He finally picks wixizode. The hearer 
interprets this using area and the game succeeds. A partial overview of the different 
choices and subsequent updating is given in Figure 6.8. The speaker decreases the score 
of powugeme which is also competing for expressing [height 0.5-1. 0]. The hearer on 
the other hand damps the alternative meanings of wixizode, which includes the meaning 
[height 0. 5-1.0] used by the speaker. 
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This example shows that there will be a divergence of opinion among the agents if the 
environment does not provide enough disambiguating cases. It is still possible of course 
that the semantic incoherence will disappear from the lexicon, but it is understandable 
that agents have difficulty in this domain to disentangle the meanings of words for tall 
and large. French has one word grand encapsulating both of these categories. 

Here are the words in the lexicon with still lower frequencies in the population (be- 
tween 0.2 and 0.5). We see more clearly several synonyms in heavy competition, for 
example kodawika and togixa for [gray 0.5.0.75], or vavuvosi and radude for [width 
0.75-1.0]. See Table 6.17. 

Table 6.17: Lexicon with lower frequencies. 



Word 


Meaning 


Frequency 


lovifo 


[hpos 0.5-0.75] 


0.21 


petenuga 


[hpos 0.5-0.75] 


0.25 


gafizuru 


[width 0.25-0.5] 


0.24 


vavuvosi 


[width 0.5-0.75] 


0.27 


radude 


[width 0.5-0.75] 


0.30 


wofetizo 


[width 0.75-1.0] 


0.42 


wetami 


[height 0.75-1.0] 


0.40 


donadewe 


[height 0.5-0.75] 


0.41 


turede 


[area 0.0-0.25] 


0.26 


likiwewe 


[area 0.0-0.5] 


0.27 


savifo 


[area 0.25-0.5] 


0.21 


rapoguwe 


[area 0.5-0.75] 


0.22 


texiraxi 


[area 0.5-1.0] 


0.31 


kodawika 


[GREY 0.5-0.75] 


0.23 


togixa 


[GREY 0.5-0.75] 


0.45 



So all the mechanisms proposed earlier do what they are supposed to do, even when 
we scale up the population. The Discrimination Game generates the repertoire of dis- 
tinctions necessary in this domain, the Naming Game generates the shared repertoire of 
form-meaning pairs. The coupling between the two based on feedback from the environ- 
ment causes a convergence even if the agents do not have any direct knowledge about 
which meanings are used by the others. 

6.4.2 Lexicon acquisition by new agents 

We finally scale up on the same dimension but now towards an open population. The 
following simulation examines what happens when a new agent enters into the popula- 
tion. The agent has no prior ontology nor any knowledge of the existing lexicon in the 
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group and no additional components or processes are added, compared to the agents in 
the simulation so far. Introducing a new agent tests in how far the cognitive architecture 
put in place enables a new agent to acquire a lexicon that already exists. 

The first words learned (after about a dozen games by the agent) are shown in Ta- 
ble 6.18. 



Table 6.18: First words after about a dozen games. 



Word 


Meaning 


Score 


larubo 


[hpos 0.0-0.5] 


0.30 


sakezomo 


[hpos 0.0-0.5] 


0.00 


tituroxu 


[hpos 0.5-1.0] 


0.50 


tokadapa 


[vpos 0.5-1.0] 


0.20 


legoka 


[height 0.0-0. 5] 


0.30 


kuvodogi 


[width 0.0-0.5] 


0.00 


gafizuru 


[width 0.25-0.5] 


0.10 


nopofi 


[width 0.5-1.0] 


0.00 



As expected, the new agent sometimes creates new words (this is the case for sakezomo 
and nopofi). But these words are very short-lived. The new agent has already picked up 
larubo which is the word in use expressing the meaning of sakezomo. larubo has already 
a higher score so sakezomo will definitely disappear. 

The most widespread words in the lexicon such as tituroxu or legoka are the ones that 
are most likely to be picked up because the chance that they will be heard is higher. This 
suggests that the entry of new agents in the population does not destabilise the lexicon 
but on the contrary it makes it more coherent 1 

For example, the new agent has solidly associated the word texiraxi with [height 0.5- 
1.0] and wixizode with [area 0. 5-1.0] as the majority of the population. Thus resolving 
the incoherence shown in Figure 6.20. 

Notice also that the new agent first acquires words for the more abstract categories. 
This is the case because (1) the discrimination trees are still developing and so more spe- 
cific categories are not yet available, and (2) even if more specific categories are available, 
agents do not try to be more specific than needed in the game. 

Table 6.19 is the set of words of the new agent with scores above 0.4 after 500 total 
games (which means more or less 100 games in which the new agent was involved). 

The agent clearly picks up the lexicon circulating in the population and generally 
associates the same meanings to the words (compare with the lexicons given earlier for 
the total population). Occasionally there are still incoherences. For example, fumetese 

1 The importance of a flux in the agent population for streamlining a language has been stressed by Simon 
Kirby, who has applied this principle in a remarkable simulation concerning the origins of hierarchical 
structure, Kirby (1999). 
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Table 6.19: Score of new agent after 500 games. 



Word 


Meaning 


Score 


texiraxi 


[height 0. 5-1.0] 


0.80 


lepowaxu 


[height 0.25-0.5] 


0.50 


kewenoku 


[GREY 0.5-1. 0] 


0.80 


wixizode 


[area 0.5-1.0] 


1.00 


vuwusugu 


[GREY 0.0-0.5] 


0.50 


tokadapa 


[vpos 0.5-1.0] 


1.00 


tituroxu 


[hpos 0.5-1.0] 


1.00 


legoka 


[height 0.0-0.5] 


1.00 


larubo 


[hpos 0.0-0.5] 


1.00 


fumetese 


[width 0.0-0.5] 


0.40 



has been associated with [width 0.0-0. 5] whereas the rest of the population uses this 
word for a distinction on the vpos-channel. The meaning of this word will later shift 
as the agent encounters disambiguating cases. We can conclude that the guessing game 
shows not only how a population may emerge a lexicon from scratch but also how new 
agents entering the group may acquire the existing lexicon. 



6.5 Conclusions 

This chapter has coupled the Discrimination Game and the Naming Game, so that agents 
now can play language games without getting explicit feedback about meanings. As in 
the case of humans, feedback only comes through the non-verbal outcome of a game. 
This may generate semantic confusion because usually more than one conceptualisation 
is possible to distinguish a topic from the other objects in the context. However we have 
seen that despite this complication agents still manage to build up an ontology and a 
lexicon which is effective for communicating in their environment. 

Although this chapter took away the assumption of direct meaning feedback, it still 
made a number of simplifying assumptions with respect to real world physical agents, in 
particular it assumed that the perception of the scene was identical for the speaker and 
the hearer. The next chapter takes away this assumption and thus sets the final step to 
test whether a perceptually grounded lexicon may emerge in a population of embodied 
distributed autonomous agents. 
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Ludwig Wittgenstein went through two major periods in his thinking. During the first 
period he worked within the framework of logical empiricism, initiated by Bertrand Rus- 
sell and others at the end of the 19th century. The logicist approach defines a language 
by listing the elementary building blocks (the words), the combination rules (the syntax), 
and a mapping from words and syntactic structures to semantic interpretations (the se- 
mantics). It assumes that there is a universal, logical structure to the world which can be 
captured once and for all in a non-ambiguous logical language and that the permissible 
inferences can be catalogued exhaustively. This project has its roots in the 17th century 
dream of Leibniz and Descartes to streamline rational thinking so that it becomes as 
clear and non-controversial as numerical calculation. Wittgenstein enthusiastically par- 
ticipated in this project, which is still vigorously being pursued today by logicians and 
linguists alike, even though many of them do not necessarily subscribe to the universalist 
thesis motivating the early developments of logic. 1 

But some drastic events happened in Wittgenstein’s life which caused a radical shift 
in his philosophical orientation. One of them was that he became headmaster of a small 
school in the Austrian mountains, without losing his interest for philosophising about 
language and meaning. By working with children and by seeing concretely how they 
engaged with language, Wittgenstein realised that the logicist approach did not address 
some basic questions one can ask about language, particularly how the semantics of 
a language might arise, or how people ever develop a shared communication system. 
Wittgenstein saw clearly that a language imposes a certain view on the world, which 
gives each language its own strength as well as its limitations. The logicist framework 
might be ideal as a tool for constructing a post factum formal description of the semantics 
of an (ideal) language, just as a structuralist grammar identifies the frozen state of its 
syntax, but not for modeling how language achieves its communicative purpose, comes 
about, or evolves. 

Wittgenstein proposed to study the use of language in terms of games. This notion 
captures that the most basic form of language involves a social interaction between in- 
dividuals within a specific setting, which acts as a context restricting the possible mean- 
ings. The interaction is played out along a conventionalised set of rules, as is chess or any 
other game. The notion of a game captures that a language interaction has a purpose, for 
example, to identify an object in a context, to gather information, to transmit emotion, 
to try and invoke an action by another person, etc. Words get their meaning as part of a 
language game. Language is a tool fully integrated in the rest of human social activity. 2 

It should be abundantly clear by now that the mechanisms explored in this book are 



1 Wittgenstein’s main work from this period is Wittgenstein (1922). A prototypical example of the logicist 
research program is found in Carnap (1928). 

2 See in particular Wittgenstein (1953). 
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strongly indebted to a Wittgensteinian point of view. Of course, we studied only one 
relatively simple language game, but we have explored it at great depth so that there is 
now a framework for studying language games with the same rigor as the framework of 
classical logic or structural syntax. 3 

We are also exploring many issues that were not raised by Wittgenstein, the most 
important one being the origins of words and meanings and how they may spread in a 
population. In this chapter, I set the ultimate step on this path, which is to ground the 
cognitive architectures of the agents in the world through their perceptual and behav- 
ioral apparatus. I will show concrete examples from a series of language games played 
by autonomous robotic agents perceiving real world scenes and we will see that very 
similar phenomena as the ones observed in the previous chapter emerge. 

The second goal of this chapter is to study the semiotic dynamics that thus unfolds 
in the population. Already in the beginning of this book, I introduced the notion of 
a semiotic square, which captures the relation between a referent, a perceived image, 
a meaning, and a form. The collection of semiotic squares that can be observed in an 
agent’s behavior or in the behavior of a group of agents forms a semiotic landscape. This 
landscape undergoes continuous change as new words are created, word-meaning rela- 
tions shift, and meanings are applied to new referents. This chapter introduces additional 
tools to study this dynamics, focusing in particular on how synonymy and ambiguity get 
introduced or damped and how language influences the conceptualisation of reality. 



7.1 A first grounding experiment 

We have arrived at a point where experiments with physically embodied autonomous 
agents become conceivable. To do so, we first have to ground the language game, which 
means that we have to couple the conceptual layer to the sensory-motor layers of the 
agent. 

7.1.1 Integrating perception and action 

The coupling between the perceptual layer and the conceptual layer is from an archi- 
tectural viewpoint very similar to the coupling of the conceptual layer and the lexical 
layer which we studied in the previous chapter. For the formulation of an utterance, 
the outputs of the perceptual layer are inputs to the conceptual layer which in turn pro- 
vides inputs to the lexical layer (Figure 7.1). Just as the other layers, the perceptual layer 
generates a set of possibilities: a set of possible segmentations and a set of sensory char- 
acteristics about each segment. These possibilities are ranked based on various criteria 
such as saliency and made available to the conceptual layer which further expands on 
the best solutions. The conceptual layer in turn generates a set of possible solutions 
with different rankings which are then processed by the lexical layer. Re-entry is nec- 



3 There have been several attempts to develop a more formal investigation of language games, see one of the 
earliest efforts in Hintikka (1998). 
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essary because the final perception of the scene depends on the ontological and lexical 
repertoires of the agent and cannot be decided on the basis of sensory processing alone. 



re-entry 



re-entry 



re-entry 



Q o o 

3 ' ? 1 ? ^ 

Xr Perceptual layer ▼ 



inputs 




Figure 7.1: Flow of solutions through the different cognitive layers when an utterance 
is being produced. Re-entry is necessary because the decision at each layer 
depends on decisions at subsequent layers. 



For the interpretation of an utterance, the main flow of information goes in the other 
direction. The lexical layer sends a variety of hypotheses to the conceptual layer and 
the conceptual layer makes use of data from the perceptual layer to see which concep- 
tualisation yields a referent. Several solutions are considered by the conceptual layer 
because words are typically ambiguous and so the perceptual layer must produce the 
appropriate segmentations and sensory characteristics to test each solution. Even if a 
sensory channel was not considered to be very salient by the hearer, it may have been 
used by the speaker in his conceptualisation of the scene and hence the hearer’s per- 
ceptual processes must actively try to seek in the image the information required to see 
whether it is applicable to the specific context. The interpretation of an utterance thus 
strongly takes the real world context into account. The utterance literally influences the 
way the hearer sees the world. The two-way flow (from perception to conceptualisation 
and language and from language and conceptualisation to perception) resolves one of 
the paradoxes of meaning discussed in Chapter 4. The clear picture of reality that we 
consciously experience results from a dynamical process in which local constraints keep 
propagating until a globally coherent solution emerges. 4 

Grounding language games in the real world not only requires a link between con- 
ceptualisation and real world perception. Of equal importance is the physical actions 
undertaken by the agents to point to the object they believe to be the topic. The accu- 
racy of pointing heavily influences overall communicative success. Indeed, a game may 

4 Dynamical systems which enter into an attractor based on a similar relaxation process have been widely 
studied, particularly in physics. One of the prototypical examples is a spin glass whose magnetic states 
keep switching until a globally coherent state is reached. 
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fail not because the speaker and the hearer did not agree on the meaning nor because 
they did not refer to the same object, but because the speaker misperceived which object 
the hearer has pointed to. These differences in perception leading to difficulties in com- 
munication do happen and may cause communicative failures despite a shared lexicon. 

7.1.2 Concept acquisition 

I now show a series of example games taken from an experiment in which two situated 
embodied agents, al and a2, play grounded language games based on coloured figures 
pasted on the white board in front of them. The following channels are available to the 
agents in the experiments in this section: hpos (the horizontal position of the midpoint 
of the segment’s bounding box), vpos (the vertical position of the midpoint of the seg- 
ment’s bounding box), height (the height of the bounding box), width (the width of 
the bounding box), area (the area of the segment, calculated by counting the number 
of pixels that belong to it), r (the average redness of the pixels in the segment), G (the 
average greenness of the pixels in the segment), b (the average blueness of the pixels 
in the segment). These “colour” channels are not to be confused with the human oppo- 
nent colour channels so the distinctions that form of them are not directly perceivable 
by human observers. 

Because there are only two agents, the risk of synonymy is almost non-existent. The 
saliency threshold is sufficiently low so that more than one sensory channel might be 
salient and hence ambiguity is unavoidable. 

A first word is acquired by the agents in game 3 based on the segmented images shown 
in Figure 7.2 (top). The different sensory values (after sensor-scaling) for the segments 
in game 3 are shown in Table 7.1. 



Table 7.1: First words after about a dozen games. 



channel 


obj-0 


obj-1 


HPOS 


0.27 


0.16 


VPOS 


0.20 


0.20 


HEIGHT 


0.15 


0.15 


WIDTH 


0.10 


0.11 


AREA 


0.10 


0.10 


R 


0.23 


0.25 


G 


0.32 


0.34 


B 


0.63 


0.65 
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Figure 7.2: Three examples of segmented images. The topic is indicated by a dashed 
bounding box in the image of the speaker. Segments which are too small 
are ignored. The topics have all been conceptualised as being To the right’ 
and so the same word gofubo has been used to refer to them. 
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hpos is the most salient channel. After sensor-scaling, the two values for hpos are 
still be drawn further apart with 1.0 for object-0 and 0.0 for object-1 so that the category 
[hpos 0.5-1. 0] easily distinguishes the topic (object-0) from object-1. 

The left image is that of the hearer al, the right one that of the speaker, a2. a2 had 
already invented the word gofubo for [hpos 0. 5-1.0] (to the right) in an earlier game, but 
the word was not acquired in that game by al because he still missed the appropriate 
distinction. Meanwhile the hpos category is available to al due to an expansion of his 
discrimination networks and so he can store the word: 

Game 3 

a2 is the speaker, al is the hearer. 
a2 segments the context into 2 objects: 
object-0 object-1 
a2 chooses object-0 as the topic 
a2 categorises the topic as [HPOS 0.5-1. 0] 

a2 says : gofubo 

al does not know gofubo 
al says: gofubo? 

a2 points to object-0 

al categorises the topic as [HPOS 0.5-1. 0] 

al stores gofubo as [HPOS 0.5-1. 0] 

This game proceeds essentially like in the computer simulations studied before. The 
major difference is that now real images have been used, the objects considered are the 
outcome of segmentation processes, the sensory characteristics have been derived from 
the image itself, and the pointing has been done by physically moving the cameras. 

7.1.3 Generalisation without learning 

Immediately the agents apply this word to very different scenes, such as the ones in 
Figure 7.2 (middle and bottom). The middle picture shows on the left the segmented 
scene from the speaker in game 5, and on the right the one from the hearer. In game 
5, the figures are blue rectangles. They are much further apart than the two circles in 
game 3. Nevertheless, after scaling, they are categorised and conceptualised similarly 
and therefore the same word could be used effectively. The concept of [hpos 0.5-1. 0] 
and hence the word gofubo is general from the very start. The agents do not need to 
see many examples because they do not use inductive generalisation. Instead, the hpos 
category is constructed in a top-down fashion as soon as the hpos channel has been 
salient and is immediately available for use in the discrimination game and hence in 
verbalising the scene. This explains why the word-meaning acquisition process observed 
in the experiments goes so amazingly fast. 

The bottom of Figure 7.2 shows yet another scene where the word gofubo was used 
with success. The topic is the rightmost shape in the scene and so [hpos 0. 5-1.0] is once 
more distinctive. To an outside observer it may look like the agents have performed a 
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gigantic inductive leap, but this is not the case at all. The agents do not try to abstract the 
commonalities from different examples but construct distinctions in a top down fashion 
and try to apply them to the perceived image. 

After a mere 50 games, the lexicon shown in Table 7.2 has emerged. Only associations 
with scores greater than 0.0 are shown. 

Table 7.2: First words after about a dozen games. 



Word 


Meaning 


Translation 


al 


a2 


wawosido 


[hpos 0.0-0.5] 


left 


0.4 


0.4 


meluri 


[hpos 0.25-0.5] 


medium left 


0.1 


0.0 


gofubo 


[hpos 0.5-1.0] 


right 


1.0 


1.0 


wiwigapo 


[vpos 0.0-0. 5] 


left 


0.1 


0.0 


fozumoba 


[area 0.5-1.0] 


large 


0.0 


0.1 


wefoto 


[r 0.0-0.5] 


low redness 


0.2 


0.2 


togene 


[r 0.5-1. 0] 


high redness 


0.5 


1.0 


fumudanu 


[g 0.0-0.5] 


low greenness 


0.2 


0.2 


puxedu 


[g 0.5-1.0] 


high greenness 


0.4 


0.4 



There is already a word ( gofubo ) which has a score of 1.0! After 100 games, the lexicon 
has become more solid and now looks as in Table 7.3. 

Table 7.3: Population lexicon after 100 games. 



Word Meaning Translation al a2 

left 0.7 0.5 



wawosido 


[hpos 0.0-0.5] 


meluri 


[hpos 0.25-0.5] 


gofubo 


[hpos 0.5-1.0] 


vokomutu 


[hpos 0.5-0.75] 


buwonipo 


[hpos 0.75-0.875] 


wiwigapo 


[vpos 0.0-0.5] 


fozumoba 


[area 0.5-1.0] 


wefoto 


[r 0.0-0.5] 


togene 


[r 0.5-1.0] 


fumudanu 


[g 0.0-0.5] 


puxedu 


[g 0.5-1. 0] 



medium left 


0.4 


0.4 


right 


1.0 


1.0 


medium right 


0.2 


0.2 


strongly right 


0.1 


0.0 


down 


0.6 


0.6 


large 


0.2 


0.4 


low redness 


0.2 


0.3 


high redness 


1.0 


1.0 


low greenness 


0.7 


0.7 


high greenness 


0.7 


0.9 
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Figure 7.3: The discrimination trees of two embodied agents al (left) and a2 (right) after 
playing 100 games (top) and after 200 games (bottom). 

7.1.4 The influence of the environment 

One thing is striking about this lexicon. It contains words for fine-grained distinctions 
along the horizontal position axis, but none for width or height. Indeed if we look at 
the discrimination trees as they exist at this point (figure 7.3 top) we see that there are 
no discrimination trees for the height and width channels. This is entirely due to the 
environment. There have simply been no situations on the white board yet where these 
distinctions are salient enough. 

As human experimenters, we can stimulate conceptual development by configuring 
scenes where these channels are needed, for example, a scene which contains two ob- 
jects with the same size, colour, and position but of significantly different width. The 
conceptual layer in each agent should then start to develop again and the lexical layer 
should follow with the construction of new words. 

A game where this happens is game 126 with the segmented images shown in Fig- 
ure 7.4 (top). Width is now the most salient channel. The segments and sensory data for 
game 126 (after sensor-scaling) are shown in Table 7.4. 

Clearly width is the most salient channel for the speaker. It is therefore chosen and 
because the agents have already grown distinctions on this channel, discrimination suc- 
ceeds and a new word can be constructed and stored. 

Game 126 

a2 is the speaker, al is the hearer. 
a2 segments the context into 2 objects: 
object-0 object-1 
a2 chooses object-0 as the topic 
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i 



Figure 7.4: Top and bottom: Examples of new segmented scenes that stimulate concep- 
tual development and hence expansions of the lexicon. The most salient char- 
acteristic in the top scene is width and in the bottom scene height. 



Table 7.4: Sensory data for game 126. 



channel 


obj-0 


obj-1 


HPOS 


0.08 


0.09 


VPOS 


0.21 


0.10 


HEIGHT 


0.10 


0.15 


WIDTH 


0.42 


0.11 


AREA 


0.34 


0.16 


R 


0.33 


0.31 


G 


0.68 


0.65 


B 


0.52 


0.50 



175 



7 Grounding 



a2 categorises the topic as [WIDTH 0. 5-1.0] 

a2 creates a new word: vaviwumu 

a2 says : vaviwumu 

al does not know vaviwumu 

al says : vaviwumu? 

a2 points to object-0 

al categorises the topic as [WIDTH 0. 5-1.0] 
al stores vaviwumu as [WIDTH 0. 5-1.0] 

Figure 7.4 (bottom) contains another example of segmented images which have stimu- 
lated conceptual growth and hence an expansion of the lexicon. In this case, categorisa- 
tions on the height channel are relevant and a word developed for tall. 

Figure 7.5 shows another series of image segments, where categorisations based on the 
area channel were effective and Figure 7.6 shows additional image segments exercising 
vpos, hpos and colour distinctions. 

After 200 games, the lexicon of the population looks as in Table 7.5. 

Table 7.5: Population lexicon after 200 games. 



Word 



Meaning 



Translation 



al 



a2 



left 



0.9 0.4 



wawosido 


[hpos 0.0-0.5] 


gixepo 


[hpos 0.0-0.25] 


wonuxa 


[hpos 0.0-0.25] 


meluri 


[hpos 0.25-0.5] 


gofubo 


[hpos 0.5-1.0] 


vokomutu 


[hpos 0.5-0.75] 


buwonipo 


[hpos 0.75-0.875] 


wiwigapo 


[vpos 0.0-0.5] 


putuwenu 


[vpos 0.5-1.0] 


vaviwumu 


[width 0.5-1.0] 


pesidumu 


[area 0.0-0.5] 


fozumoba 


[area 0.5-1.0] 


wefoto 


[r 0.0-0.5] 


togene 


[r 0.5-1. 0] 


fumudanu 


[g 0.0-0.5] 


puxedu 


[g 0.5-1. 0] 



very left 


0.4 


0.4 


very left 


0.10 


0.0 


medium left 


0.4 


0.4 


right 


1.0 


1.0 


medium right 


0.4 


0.4 


strongly right 


0.1 


0.0 


down 


0.5 


0.4 


up 


0.5 


0.5 


wide 


0.5 


0.5 


small 


0.2 


0.2 


large 


1.0 


1.0 


low redness 


0.2 


0.1 


high redness 


0.9 


1.0 


low greenness 


0.7 


0.9 


high greenness 


0.9 


0.9 



Note that new words have come into the lexicon for ‘up’ ( putuwenu ), ‘down’ (wi- 
wigapo), and ‘wide’ (vaviwumu). The discrimination trees of the width and height 
channel have started to expand (Figure 7.3 bottom). 
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I 




Figure 7.5: Segmented images where distinctions on the area channel have been used. 

The topic (with dashed bounding box) in the two top cases has been cate- 
gorised as large. In the bottom case, a word meaning ‘small’ was used. 
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Figure 7.6: The image segments from the top game require distinctions on the vpos chan- 
nel and caused the creation of a word for ‘upper’. The image segments in the 
middle led to the use of refined distinctions on the hpos channel, so as to 
identify the middle square. The topic in the image segments at the bottom 
was done with a conjunction of categories: blue and tall. 
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7.1.5 Coping with perceptual anomalies 

Grounding language games in physical environments makes it obviously harder for the 
agents in a number of respects. First of all because perception and segmentation may 
differ, the salient characteristics of objects may not be the same and as a result the hearer 
may guess another meaning for an unknown word, compared to the one used by the 
speaker. This happens for example in the following game (game 128) which is based on 
the segmented images shown in Figure 7.7 (top). The speaker’s perception is shown to 
the left and the hearer’s to the right. 

Game 128 

al is the speaker. a2 is the hearer, 
al segments the context into 3 objects: 
object-0 object-1 object-2 
al chooses object-0 as the topic 
al categorises the topic as 

[ HPOS 0.0-0.25] [ HPOS 0.0-0.125] 

al creates a new word: 'bagaxe' for [HPOS 0.0-0.25] 

al says: bagaxe 
a2 does not know bagaxe 
a2 says: bagaxe? 
al points to object-0 

a2 categorises the topic as [AREA 0. 5-1.0] 
a2 stores bagaxe as [AREA 0. 5-1.0] 

Although the difference may not appear significant to the human eye, the topic (the 
left most rectangle) is in the raw perception of the speaker less wide as the same rectangle 
perceived by the hearer. The segments and sensory data of the speaker in game 128 (after 
sensor-scaling) are shown in the Table 7.6. 

Table 7.6: Sensory data for game 128. 



channel 


obj-0 


obj-l 


obj-2 


HPOS 


0.03 


0.15 


0.33 


VPOS 


0.29 


0.29 


0.01 


HEIGHT 


0.37 


0.39 


0.07 


WIDTH 


0.10 


0.08 


0.27 


AREA 


0.29 


0.22 


0.16 


R 


0.95 


0.97 


0.97 


G 


0.33 


0.40 


0.93 


B 


0.35 


0.44 


0.26 
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Figure 7.7: Examples of scenes causing confusion due to perceptual anomalies. In the top 
scene, the segmentation of the speaker (left) makes the topic’s most salient 
characteristic the horizontal position, whereas for the hearer (right) the most 
salient characteristic of the same segment is its area. In the bottom scene, 
speaker and hearer have segmented the scene into two, respectively three 
segments. This causes the game to fail even though both agents had the same 
meaning for the word used by the speaker. 



Context-scaling further amplifies this difference which shows that it is not always 
beneficial to do so. al uses the horizontal axis, creating a new word bagaxe meaning Very 
left’ [hpos 0.0-0.25] whereas a2, for whom the area is the most salient characteristic, 
associates this word with [area 0.5-1. 0] (large). This example shows that grounding 
introduces additional risks for the introduction of semantic incoherence in a group’s 
lexicon. 

Due to perceptual anomalies, a game may fail even though both agents already have 
the same meaning for the same word. This happens when this meaning yields different 
objects (or no objects at all) for the speaker and the hearer. As a consequence, the hearer 
adopts another meaning for the word which starts to compete with the one he already 
had. This happened in the following game (game 127) based on the segmented images 
shown in Figure 7.7 (bottom). The speaker (left image) has identified only two objects, 
but the hearer three. The speaker (left image) uses green as distinguishing characteristic 
which is indeed appropriate, but because the hearer’s third object (the top rectangle) is 
also green, this distinction fails for him. Even though the hearer had already a well estab- 
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lished meaning for puxedu, he associates a new meaning based on the wiDTH-channel 
with this word because this channel is now the most salient. 

Game 127 

al is the speaker. a2 is the hearer, 
al segments the context into 2 objects: 
s-object-0 s-object-1 
al chooses object-0 as the topic 
al considers as salient G R AREA HPOS 
al categorises the topic as 

[HPOS 0. 0-0.5] [HPOS 0.0-0.125] 

[AREA 0. 0-0.5] [R 0. 0-0.5] [G 0.5-1. 0] 
al has the words 

puxedu for [G 0. 5-1.0] (1.0) 
wawosido for [HPOS 0.0-0. 5] (0.7) 
pesidimu for [AREA 0.0-0. 5] (0.2) 
wefoto for [R 0. 0-0.5] (0.2) 
buwonipo for [G 0.5-1. 0] (0.0) 

al says: puxedu 

a2 segments the context into 3 objects: 
h-object-0 h-object-1 h-object-2 
a2 interprets puxedu as 
[G 0. 5-1.0] (0.20) 

a2 identifies h-object-0 h-object-1 
a2 says: puxedu? 
al points to s-object-0 

a2 categorises the topic as [WIDTH 0.5-1. 0] 
a2 stores puxedu as [WIDTH 0.5-1. 0] 

After 500 games, the lexicon is as in Table 7.7. 

The global evolution of success and ontology is shown in the graph in Figure 7.8. The 
graph is not fundamentally different from the ones we have seen in the computer simu- 
lations in the previous chapter, except that more failures occur after a lexicon is estab- 
lished due to the contingencies of real world images and the unavoidable stochasticity 
associated with physical interactions. 

So despite the difficulties caused by perceptual anomalies and errors in non-verbal real- 
world interaction, the mechanisms used by the agents appear sufficiently robust that a 
shared communication system manages to get off the ground. We have been able to take 
away all the scaffolds put up in earlier chapters and the complete language system now 
stands on its own feet. Of course we now need to further scale up the challenge to the 
agents, particularly along three dimensions: complexity of the environments, complexity 
of the sensori-motor apparatus (particularly the number of sensory channels available to 
the agents), and size of the agent population. Only the latter type of scale-up is studied 
in the remainder of this chapter. 
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Table 7.7: Population lexicon after 500 games. 



Word 



Meaning 



Translation 

left 



al 



a2 



0.9 0.4 



wawosido 


[hpos 0.0-0.5] 


gixepo 


[hpos 0.0-0.25] 


wonuxa 


[hpos 0.0-0.25] 


meluri 


[hpos 0.25-0.5] 


gofubo 


[hpos 0.5-1.0] 


vokomutu 


[hpos 0.5-0.75] 


buwonipo 


[hpos 0.75-0.875] 


wiwigapo 


[vpos 0.0-0.5] 


putuwenu 


[vpos 0.5-1.0] 


vaviwumu 


[width 0.5-1.0] 


pesidumu 


[area 0.0-0.5] 


fozumoba 


[area 0.5-1.0] 


wefoto 


[r 0.0-0.5] 


togene 


[r 0.5-1. 0] 


fumudanu 


[g 0.0-0.5] 


puxedu 


[g 0.5-1. 0] 



very left 


0.4 


0.4 


very left 


0.10 


0.0 


medium left 


0.4 


0.4 


right 


1.0 


1.0 


medium right 


0.4 


0.4 


strongly right 


0.1 


0.0 


down 


0.5 


0.4 


up 


0.5 


0.5 


wide 


0.5 


0.5 


small 


0.2 


0.2 


large 


1.0 


1.0 


low redness 


0.2 


0.1 


high redness 


0.9 


1.0 


low greenness 


0.7 


0.9 


high greenness 


0.9 


0.9 




Figure 7.8: Success (left y-axis) and average ontology size (right y-axis) for two agents 
playing 500 guessing games about real world scenes perceived through their 
cameras. Occasionally new situations have been introduced to stimulate con- 
ceptual development. 



182 



7.2 Semiotic dynamics 



7.2 Semiotic dynamics 

Semiotic dynamics refers to the changing relationships between words, meanings, per- 
ceptions, and real world scenes observed while a group of autonomous distributed agents 
play language games about scenes from an open evolving environment. The possible ut- 
terances and possible meanings are not fixed but continuously changing as the agents au- 
tonomously evolve their communication systems and adapt to changing environments. 
Tracking and understanding these changes is a non-trivial task. It is comparable to the 
investigation of other non-linear complex dynamical systems and therefore similar tools 
are useful. 5 The study of semiotic dynamics is an entirely new subject for linguistics, and 
I will give here only some examples to illustrate the approach. 

7.2.1 Tracking language evolution 

The first thing we need is a systematic way to collect data. In studying the lexical and 
ontological development of the agents, I have so far played god, inspecting the internal 
states of the agents. With larger agent populations that are travelling over the Internet 
and engage in interactions in different physical sites, it becomes impossible to perform 
these computations, because ontologies and lexicons are distributed over many agent 
servers throughout the world and different language games are going on in parallel. We 
are forced in these circumstances to adopt the viewpoint of a linguist, who can only 
observe the overt linguistic behavior in the community, not the internal states of each 
individual. So we have built tools that track the language games as they take place in 
parallel on a world-wide scale. The tools are available to anyone who logs on through 
the Internet and wants to see for him or herself how the language system is evolving. 6 

Of course, an external observer’s point of view is only partial. As we have seen in the 
previous chapters, many words and categories are latently known by the agents, without 
their being used in overt behavior, just as we carry genes in our bodies that are not being 
expressed. Nevertheless, observations of the actual behavior of the agents is in a sense a 
more natural way to characterise ontologies and lexicons and reflects well the semiotic 
dynamics in the population. For the remainder of this chapter, all data are taken from 
observing experiments with the Talking Heads as they play grounded language games. 

7.2.2 Semiotic landscapes 

A semiotic landscape contains all the semiotic relationships that effectively occurred 
at least once in the games played by a particular population of agents during a certain 
period of time. The semiotic landscape is a graph. The nodes in the graph are formed 
by situations (referents in a specific context), meanings, and forms (words), and there 
are links if the items associated with two nodes indeed co-occur (Figure 7.9). In a more 

5 See Badii & Politi (1997). Examples of ways to model evolutionary systems are shown in Kauffman (1993) 
and Maynard Smith (1989). 

6 The website http://talking-heads.csl.sony.fr/ contains the latest statistics on this world-wide evolution. The 
observational tools were mainly built by Joris Van Looveren and Frederic Kaplan. 
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Figure 7.9: Typical segment of a semiotic landscape capturing the co-occurrence rela- 
tions between referents, meanings and word forms. The FM/MF relations are 
in thick lines, the RM/MR relations in regular lines, and the RF/FR relations 
in dashed lines. 



complete picture, the landscape makes a distinction between external referents and seg- 
mented images but I will not do so in the present chapter. The relations in the landscape 
are labeled as in Table 7.8. 

The partial landscape in Figure 7.9 (taken from an actual experiment) contains an exam- 
ple where the agents use two possible meanings for conceptualising situation2, namely 
[gray 0.0-0.25] (very light) and [hpos 0.5-0.75] (medium right ). The words katapu and 
tisame lexicalise [gray 0.0-0.25] and wobo and tisame [hpos 0.5-0.75]. Each meaning has 
therefore two synonyms and tisame is ambiguous; it can mean both [gray 0.0-0.25] and 
[hpos 0.5-0.75]. Three words are used to communicate the situation: ‘katapu’, ‘tisame’, 
and ‘wobo’. Such a structure is typical for grounded lexicon evolution and complexity 
rapidly increases when the same meanings are used to communicate about other situa- 
tions (which is obviously very common and indeed desirable). 



Table 7.8: 



Label 


Relation 


RM 


referent to meaning 


MR 


meaning to referent 


FM 


form to meaning 


MF 


meaning to form 


RF 


referent to form 


FR 


form to referent 



184 



7.2 Semiotic dynamics 



7.2.3 Competition diagrams 

The degree of coherence of a language system can be studied by collecting data on the 
frequency of the members of each relation in the semiotic landscape for given periods 
of time (for example periods of 100 games). The result is represented in competition 
diagrams, such as the RF-diagram in figure 7.10 taken from actual experiments. It plots 
the evolution of the frequency of the referent-form relations for a given referent. In other 
words, all games during a certain period are collected where this particular situation (i.e. 
a specific referent in the same context) occurred and then the frequencies of all words 
used to refer to this referent in the same series of games are computed. Similar diagrams 
can be constructed for the other semiotic relationships. The FR-diagram plots all the 
referents for a given form, the MF-diagram all the forms for a given meaning, the FM- 
diagram all the meanings for a given form, the MR-diagram all the referents for a given 
meaning and the RM-diagram all the meanings for a given referent. 




Figure 7.10: This RF-diagram shows the frequency of all forms used for the same referent 
in 3000 language games, played by a group of 20 embodied situated agents. 

The RF-diagram in Figure 7.10 shows the frequency with which certain words were 
used to communicate a particular situation. We see that in the beginning the word ‘de- 
sepu’ has been dominant, then there is a period of turbulence in which different words 
compete, but after a while a new word va wins the competition and becomes the dom- 
inant way to communicate about this situation. Figure 7.11 shows another competition 
diagram plotting the evolution of the FM-relation, for the word form va, in other words 
the frequencies of all the meanings that co-occurred with the word va. We see an early 
peak when va was used in 70 % of the games with the meaning [b 0.3125-0.375], i.e. a 
particular shade of blue. Then there is a struggle during which additional distinctions (on 
the red and vpos-channels) are competing for the dominant meaning of va. [r 0.0-0.125] 
finally becomes dominant. 
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Figure 7.11: This FM-diagram shows the frequencies of each form-meaning pair with the 
form equal to va in a series of 5000 games. A disambiguating situation occurs 
in game 3000 causing the loss of one meaning of va. 



These competition diagrams are an important tool to try and make sense of the onto- 
logical and lexical evolution taking place in evolving groups of agents as they are playing 
their language games. Typically we pick a dominating word, for example the word va 
in Figure 7.10, and try to understand why it has become dominant. The FM-diagram for 
va (Figure 7.11) explains part of the story. Three stable meanings for va have emerged 
at around 1000 games: [r 0.0-0.125], [b 0.3125-0.375], and [vpos 0.25-0.5]. They are all 
equally adequate for distinguishing the object va designates, and there are no situations 
yet that would have forced the disambiguation of va. In game 3000, the environment 
(which is continuously changing in this experiment) produces a scene in which a cate- 
gory which was distinctive for the object designated by va is no longer distinctive. The 
lexicon adapts immediately. Around game 3000 the vpos-based meaning disappears, and 
the distinction based on red shoots up and becomes dominant. 

7.2.4 RMF coherence 

The average frequency of the dominating relations along a particular semiotic dimension 
is an indication how coherent the community’s language system is along that dimension. 
For example, suppose we want to know the coherence along the meaning-form dimen- 
sion, in other words whether there are many synonyms in the lexicon or not. For a given 
series of games, we calculate for each meaning that was indeed used in the series, the 
frequency of the most common form for that meaning. Then we take the average of 
these frequencies and this represents the MF-coherence. If all meanings had only one 
form, the MF-coherence is equal to 1.0. If two forms where used for the same meaning 



186 



7.3 The ideal language 



with equal frequency, MF-coherence is 0.5. When plotting the MF-coherence we can 
therefore follow the tendency towards an increase or decrease of synonyms. 

Examining the coherence along the other dimensions is equally instructive. Studying 
the coherence along the FM dimension informs us about the degree of ambiguity in the 
lexicon, because it is based on the average frequency of the preferred meaning for each 
word. When all word forms have only one meaning, the FM-coherence is 1.0. The more 
the forms in a language have different meanings with non-zero frequencies, the lower 
the FM-coherence becomes. 

The coherence along the RM dimension informs us about how many possible concep- 
tualisations of the same situation are used by the population. If RM-coherence is high, 
this means that the population has uniform conceptualisations. For every referent, all 
agents typically use the same meaning. Usually there are initially many possible ways 
to conceptualise a scene, but there is a tendency for agents to view the world in a similar 
way under the influence of language because the scores of the discrimination trees are 
affected by success of a distinction in language games. I will discuss this in more detail 
in the final section of this chapter. 

The inverse relation (MR, between meanings and referents) tracks the frequency with 
which certain situations are covered by specific meanings. It informs us about the gen- 
erality of the categories available to the agents, assuming that all agents statistically 
encounter the same sort of environments. If a particular meaning can pick out many 
possible referents in many contexts, this meaning must be abstract and the agents must 
have managed to develop a lexicon that is not tied up completely to specific situations. 



7.3 The ideal language 

Complex adaptive systems show a tendency to optimise their internal functioning de- 
spite the absence of a global overall control center. For example, each species in a sin- 
gle ecology tends to become better in exploiting its niche and the global system moves 
towards a balanced equilibrium where different species play different roles in a com- 
plex ecological web. 7 This book explores the idea that language is a complex adaptive 
system like a natural ecology, which is shaped and reshaped by the local interactions 
of autonomous agents without a central controlling agency regulating what linguistic 
conventions should be adopted. We have seen that in computer simulations and exper- 
iments with robotic agents a shared communication system emerges, but in how far is 
this system optimal? 8 

7.3.1 Total coherence 

A first desirable property of a language system is perhaps that it is totally coherent along 
all semiotic dimensions. In terms of semiotic landscapes, this means that the graph con- 
sists of unconnected triangles. Each object in a specific context has a unique meaning, 

7 See examples in Margulis (1991). 

8 See for the discussion on whether languages ever can be said to optimise: Kirby (1999). 
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each meaning has a unique form and picks out a unique referent, and each form has a 
unique meaning and hence a unique referent as well. The coherence along all possible di- 
mensions is then always 1.0. Given such a system, the agents would not need to consider 
different hypotheses while speaking or listening, there would never be any confusion, 
and a new agent learning the language would never be confronted with different uses 
of the same word. 

Such an ideal language has been a dream of many philosophers, including Descartes 
and Leibniz, but the investigations reported here show clearly that it is not attainable. 9 A 
language must be open to the expression of new meanings because the communicative 
objectives and the environments which are the subject of communication keep changing. 
Hence synonymy (incoherence along the MF dimension) is unavoidable because words 
are created by some agents not knowing that words already exist in the population for 
the same meaning. In addition, one agent may “wrongly” infer that a certain word has 
a particular meaning due to perceptual anomalies, even though the same word had al- 
ready another meaning in the lexicon. Word-meaning pairs that thus arise may start to 
propagate in the population and actually supersede the “original” word-meaning pairs. 

Ambiguity (incoherence along the FM dimension) is unavoidable as well for similar 
reasons. The same situation can often be conceptualised in more than one way and so an 
agent guessing the meaning of an unknown word or trying to make sense of a word in a 
particular context may easily derive another meaning than the one used by the speaker. 
Perceptual anomalies further aggravate the risk that a certain form becomes associated 
with another meaning by the hearer, as we have seen in some grounded example games 
earlier on. 

Different conceptualisations of the world (incoherence along the RM and MR dimen- 
sions) are even harder to avoid because every agent develops its own ontologies inde- 
pendently and without any direct feedback from the other agents. The ideal language 
system is not only impossible to attain for autonomous agents engaged in grounded lan- 
guage games, it would be very inefficient to store and use as well because new words 
and meanings would be required for every new situation ever encountered. The larger 
the lexicon, the harder the task of a language learner to acquire it. So there is a trade off 
between coherence and expressibility. This is why natural languages constantly try to 
recruit existing words to keep down the repertoire of forms that have to be stored and 
hence learned. 

7.3.2 Communicative success despite incoherence 

A grounded language system cannot be fully coherent. This implies that communication 
among autonomous embodied agents can only work if their internal architectures are 
capable to handle incoherence. Of course, incoherence may not necessarily impinge 
on communicative success. Alternative conceptualisations may be compatible with the 
same situations. Agents may not even realise that their language systems are different 
because even though their words have different meanings these meanings may always 
pick out the same topic. Thus the RMF-landscape in Figure 7.9 leads to total success for 
communicating situation2. Even if a speaker uses ‘tisame’ to mean [gray-0.0,0.25] and 

9 The history of this search for such an ideal language is discussed by Eco (1997). 
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the hearer understands ‘tisame’ to mean [hpos-0.5,0.75], they still have communicative 
success. The goal of the language game is to find the referent. It does not matter whether 
the meanings are the same. The agents cannot even know which meaning the other one 
uses because they have no access to each other’s internal states. 

The architecture of the agents in the Talking Heads experiment has been carefully de- 
signed so that incoherence can be handled. The context is at every level strongly taken 
into account. When producing an utterance, the meaning of a word is partly determined 
by whether that meaning makes sense in the specific context of the language game. This 
makes it possible to handle ambiguity. To handle synonymy, agents store several words 
in their lexicons so that they can understand more words than the ones they prefer them- 
selves. Agents maintain different ways to conceptualise reality so that they can apply 
conceptualisations used by others even though they would not prefer these themselves. 

The agents behave in their own selfish interest to maximise success in the game. They 
increase the score of the word-meaning pairs that yielded success and decrease those 
that resulted in failure so that next time around they are more likely to use that word- 
meaning pair. But a side effect of this behavior is that synonymy and ambiguity get 
damped. Success depends on use in the rest of the population. The more agents use a 
word, the more it will have success, the higher the scores will be, and the more it will 
be used. The global effect is that the language becomes more coherent without a central 
coordinator. 



7.4 Damping synonymy and ambiguity 

To conclude this chapter, I now discuss in more detail an example of this kind of semi- 
otic dynamics as gleaned from an actual experiment with twenty Talking Heads, taking 
turns to materialise themselves in two robotic bodies at a single physical site. The agents 
have only R, G, B, and gray-scale channels. This case study illustrates how the tools intro- 
duced in the previous section help us to make sense of the very complex lexical evolution 
spontaneously arising in the system. We have set up the experiment in such a way that 
the agents first see a limited set of objects. Then we have progressively added new situ- 
ations to the environment (by pasting new figures on the white board or reconfiguring 
existing figures) and studied the impact on the lexicons and ontologies of the agents. 10 

Figure 7.12 plots the global result of the experiment for 35,000 games. It shows the pro- 
gressive increase in environmental complexity (after every 5000 games) and the average 
communicative and discriminative success in the game. During the final 15,000 games, 
no new objects were introduced. 

We see clearly that the agents manage to bootstrap a successful lexicon from scratch 
in the first 1000 games. Success then drops every time the environment increases in 
complexity but regains as the agents invent new words or create new meanings. Pro- 
gressively it is less and less difficult to handle increased environmental complexity be- 



10 These experiments work were done in strong collaboration with Frederic Kaplan, see for example Steels & 
Kaplan (1999). 
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Figure 7.12: The graph shows the communicative and discriminatory success for a series 
of 35000 language games played by a group of 20 Talking Heads. The envi- 
ronment has progressively been made more complex. 



cause distinctions are already available to cope with the novel situations, words are less 
ambiguous, and the lexicon is covering more and more meanings. 

7.4.1 The story of fepi 

Only looking at macroscopic measures like communicative success hides away the in- 
teresting rich lexical dynamics that unfolds in the population. Let me examine just one 
word, fepi. By looking at the FR-diagram, we see that this word is used consistently for 
identifying two objects (03 and 05) in a certain set of contexts (Figure 7.13). 

We can see the meanings of fepi, by inspecting its FM-diagram (Figure 7.14). The 
dominant meaning of fepi is a particular shade of green [g 0.25-0.5]. There are some 
other competing meanings (including [gray 0.25-0.3125]) but most of them are hardly 
ever used. We observe clearly the tendency for ambiguity to get damped. 

What about synonymy? Let us look at the MF-diagram of [g 0.25-0.5], so that we 
can see whether there are any other words in use for expressing the same meaning 
(Figure 7.15). fepi has indeed emerged as dominant for this meaning, but this has not 
been without an intense struggle. The tendency for synonymy to get damped is clearly 
present. Even though the lexicon occasionally destabilises, the lateral inhibition and 
the positive feedback loop between use and success causes self-organised MF-coherence. 
There is however something curious going on. In the early phases xu was dominant. Why 
did it destabilise and how has fepi has managed to become the dominant expression for 
[g 0.25-0.5]? 
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Figure 7.13: FR-diagram showing the frequencies of the objects referred to by the word 
fepi. fepi is consistently used for 03 as well as 05. 




Figure 7.14: FM-diagram showing the different meanings of fepi. One meaning [g 0.25- 
0.5] dominates. 
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Figure 7.15: MF-diagram showing the different words circulating in the population for 
expressing the category [g 0.25-0.5]. First xu dominates and then fepi wins 
the competition after an intense struggle. 

7.4.2 The story of xu 

When inspecting in more detail the game traces, we see that/epz is created in game 328 
by agent-3, playing the role of speaker, in order to refer to object 03 using the meaning 
[g 0.25-0.5]. Agent-19, the hearer in the same game, acquires the same meaning for [g 
0.25-0.5]. In one sense, we could say that agent-19 has learned this meaning of fepi from 
agent-3 but that is not entirely accurate. Agent-19 has constructed a possible meaning 
for fepi and this happened to be the same as the one used by agent-3, but this is partly 
accidental. Agents only indirectly learn the language from others. They construct a 
language which is compatible with the language used by others. Coherence among the 
individual language systems occurs through the positive feedback between language use 
and communicative success. 

fepi entered into the lexicon to refer to 03, but we see from the RF diagram for 03 
(Figure 7.16) that xu was already well established for 03 and initially fepi had no success 
at all. So the puzzle is still there, how did fepi manage to overtake xu? 

Let us look at the different meanings of xu on a magnified scale by inspecting the 
FM-diagram of xu (Figure 7.17). Only the first 10,000 games are shown because after that 
xu is no longer used. In the first 5000 games, xu has the same dominant meaning as fepi, 
namely [g 0.25-0.5]. There are some other meanings associated with xu, which are all 
effective to conceptualise 03. 
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Figure 7.16: RF-diagram showing the different words being used for identifying 03. Ini- 
tially xu dominates 



0.9 



o.B [G-0.25,0.5] 




Figure 7.17: FM-diagram showing the different meanings of xu. After game 5000, the 
meaning of xu becomes unclear and the word falls in disrespute. 
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7.4.3 The entry of 03 

The mystery is unveiled by looking at what happened when 03, a new object, entered 
the environment around game 5000. The word xu now not only picks out 03 in certain 
contexts but 05 as well. Hence games where both objects are occurring fail and conse- 
quently the association between xu and [g 0.25-0.5] is weakening. Closer examination 
reveals that 05’ s green value is a bit lower (in the range [0.25,0375]) than that of 03 
(which is in the range [0.375,0.5]), so that a more refined distinction on the G-channel is 
necessary if both objects occur in the same context. 



0.9 

0.8 

RIMEBI 




Figure 7.18: RF-diagram showing the different words used for 05. Initially a word rimebi 
dominates. It destabilises andfepi takes over. 

As seen from the RF-diagram in Figure 7.16, xu is no longer used for 03. Instead the 
word ‘pasi’ comes to dominate, ‘pasi’ has indeed the more specific meaning [g 0.375- 
0.5] which is only applicable to 03, not to 05. At the same time we see from the RF- 
diagram for 05 (Figure 7.18) that the word ‘rimebi’ initially dominates for designating 
05. ‘rimebi’ has the more specific meaning [g 0.25-0.375] which is not applicable to 03. 
The more general word xu is still useful in some contexts where the refined distinction is 
not necessary (for example where either 03 or 05 is present but not both). So we would 
expect that xu continues to exist. However this is not the case, xu loses out completely 
and its role is taken over by fepi. 
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Understanding why this is the case tells us a great detail about the kind of hidden 
semiotic dynamics that takes place, xu loses its strength because (1) it fails in games 
where its meaning is not distinctive enough so that its score goes down, and (2) because 
there are other meanings competing with xu which do not have strong alternatives and 
are therefore less prone to failure. This is notably the case for fepi. fepi carries the more 
general meaning of green and does not have competitors. It therefore overtakes xu. 

This example shows many things. Clearly lexical dynamics can be very complicated, 
despite the fact that the underlying mechanisms are relatively simple. Complexity comes 
partly from the complexity of the environment that continues to challenge the agents 
with novel , and partly from the internal complexity the semiotic dynamics sponta- 
neously generates. There are strong tendencies in the agents’ lexical systems towards 
FM and MF coherence, in other words towards shared lexicons. These tendencies are not 
due to a central controlling authority which has a global view of the lexicon and dictates 
to the agents what they should do but because incoherent form-meaning pairs do not 
resist when the environment changes. As we have seen, the word fepi could overtake 
xu, because xu had alternative meanings that caused failures in novel situations and fepi 
did not. fepi had a higher coherence and therefore survived, even though xu was used 
more often but with an ambiguous meaning. 



7.5 Rousseau’s paradox 

The self-organised coherence of lexicons is an important outcome of the experiments, 
but what about the other semiotic dimension, the relation between situations and their 
conceptualisation? In how far do the agents use the same conceptualisations of reality 
in the same situations (RM-coherence) and in how far do they pick out the same topics 
given the same meaning (MR coherence). Many different ways to conceptualise reality 
may exist side by side and if each one can be expressed, it would lead to successful 
communication. So there is less a clear pressure to make the RM-coherence increase. The 
advantage of high RM-coherence is that the agents are more uniform in their behavior, 
so that fewer hypotheses need to be considered and the acquisition of the lexicon by 
virgin agents is easier. 

7.5.1 Universality versus relativism 

This raises a fundamental question, which has been heavily debated throughout the his- 
tory of philosophy, namely in how far have different languages different ontologies and 
in how far does the use of a language influence ontological coherence in a population. 
From one point of view, words are seen as labeling existing (innate or learned) categories, 
and so the problem of learning a lexicon consists in learning the association between 
unknown labels and known internal categories. 11 On the other hand, ethnologists and 
linguists studying non-European languages almost invariably arrive at the conclusion 

11 “The speed and precision of vocabulary acquisition leaves no real alternative to the conclusion that the 
child somehow has the concepts prior to experience with language, and is basically learning labels for 
concepts that are already part of his or her conceptual apparatus.” p.21. in Chomsky (1987). 
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that there are profound differences between languages, not simply in which words they 
use but also in which way they conceptualise reality. This implies that there is a kind of 
co-evolution between language and meaning. 12 

Here is a seemingly trivial example of the interaction between language and meaning. 
In French, the second singular person pronoun (‘you’) has two forms: ‘tu’ and ‘vous’. 
Textbooks say that the first is colloquial and the second form is polite. A speaker is 
therefore required to categorise the social relation between himself and the hearer, which 
he is not forced to do in English. But polite/colloquial is too simplistic to capture the 
underlying usages. The categorisation of the speech situation is quite subtle, possibly 
incorporating age differences, professional status, class differences, pragmatic context, 
speaking style, etc. Someone learning to speak French must not only learn that ‘ you’ 
has two forms but also what the subtle distinctions are between the situations where 
you use one or the other. If you think learning this distinction is difficult, just consider 
Japanese, where there are dozens of words for T, some of them listed in Table 7.9. 

Table 7.9: Words for T in Japanese 



A 


Chin 


As-shi 


Fu-sho 


A-i 


Da-i-ko 


A-ta-i 


Ge-kan 


A-ta-ki 


Ge-se-tsu 


A-ko 


Da-ra 


A-ta-ku-shi 


Ge-sho 


A-re 


Fu-bin 


A-te 


Gi-ra 


A-shi 


Fu-ko-ku 


A-ta-shi 


Go -jin 


Bo-ku 


Gu 



Here is another example which is more related to the perceptually grounded distinc- 
tions we are studying here. To categorise space, a viewer typically imposes a frame of 
reference on the scene before him and categorises regions and positions in terms of this 
frame of reference. For example, standing in front of a chair we could say in English the 
table is to the right of the chair, where to the right of designates an area to the right within 
the frame of reference relative to axes emanating from the observer. This seems the most 
natural and simplest way to categorise space and it is used by the Talking Heads when 
they expand the hpos channel. However, there are quite a few languages which impose 
an absolute frame of reference, see Levinson & Wilkins (2006). For example, the Tene- 
japans from Chiapas, Mexico speak a Mayan language known as Tzeltal. They live in a 
mountaneous region that is generally sloping north-northwest. They use this regional 
characteristic to introduce an absolute frame of reference with three distinctions: uphill, 
downhill, and across. Standing in front of the chair, they would literally say something 
like ‘standing at its uphill chair the table’, in other words, ‘the table is uphill from the 
chair’. The spatial categories left, right, front, back, etc. simply do not exist in Tzeltal. 
Something that is to the left from the viewpoint of English could be ‘downhill’, ‘uphill’, 
or ‘across’ depending on its absolute position. If we want to translate to the left of in 

12 The best known representative of this position is Whorf (1956). See for a wider discussion Lee (1996). 
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Tzeltal, we first have to conceptualise the global reference frame that is valid in the sit- 
uation being described, and only then we can choose whether ‘uphill’, ‘downhill’, or 
‘across’ are appropriate translations. It could be argued that these differences are purely 
due to differences in lexicalisation. But this is not so. An absolute frame of reference 
has not only an impact on language but also deep implications for other cognitive tasks. 
Psychologists have invented non-verbal tasks where speakers of Tzeltal make other spa- 
tial inferences than Europeans. Ignoring such profound cultural differences is therefore 
a sure recipe for disastrous misunderstandings. 

The degree of sharing (or non-sharing) of an ontology in a group raises the paradox, 
first expressed by Jean-Jacques Rousseau. Language requires a sufficiently shared cat- 
egorisation of reality, otherwise no communication is possible. But, if every language 
employs its own categorisation (even if there are large overlaps), how is a particular indi- 
vidual entering a language community supposed to know the categorisation implicit in 
his or her language? It is clear that language helps foster shared meaning because mean- 
ing is transmitted through language, for example when a speaker explains the meaning 
of a word to a hearer. On the other hand, successful language interaction already requires 
at least some shared meaning, how could the system otherwise bootstrap itself? So we 
have a chicken and egg situation, a causal circularity that somehow must be broken. 

7.5.2 Ontological coherence 

The Talking Heads experiment shows a way to resolve this paradox. Different agents 
develop their own ontology in a selectionist fashion, using the growth and pruning dy- 
namics discussed in Chapter 4. Because there is some randomness in the growth process 
and because different agents see different cases in which other channels may be salient, 
it is highly unlikely that they all end up with exactly the same set of categories, even 
though agents operating in the same environment with a similar sensori-motor appara- 
tus will develop similar distinctions. 

On the other hand, the coupling between the conceptual and lexical layer discussed 
in the previous chapter causes a strong interaction between the two. Those distinctions 
whose lexicalisations are the most successful are preferred, and the scores of categorisers 
is influenced by the outcome of the games in which they were used. This structural 
coupling causes a progressive coordination of the ontology and the lexicon of the agents, 
even though ontologies will never be completely identical. 

Figure 7.19 shows the result of some experiments focusing on ontological coherence 
(i.e. RM-coherence). The agents reach a close to 100 % discriminatory success, and coher- 
ence climbs to 75 %. Multiple solutions are possible, and so there is no reason why the 
agents would have completely convergent ontologies. 

A more fine-grained way to visualise the emerging coherence between the agents’ 
ontologies is through a coherence web (see Figure 7.20). There is an axis for each agent 
as well as a line emanating from the center of the web. Let al and a2 be two agents, 
then al’s line intersects with a2’s axis at the level of coherence between al and a2. For 
example, if al and a2 have a 50 % ontological coherence, then al’s line intersects a2’s 



197 




7 Grounding 




games 



Figure 7.19: This figure shows the discrimination success as well as the ontological (RM) 
coherence for a series of 5000 discrimination games in a group of 10 agents. 



axis at point 0.5. The intersection between an agent’s line and its own axis represents the 
average coherence of this agent with respect to all the other agents. When there is little 
coherence, the lines cluster around the center of the cobweb. As coherence increases, 
the lines approach more and more the edges of the diagram. If all lines end up exactly 
on the edges, there is complete coherence. Similar coherence webs can be constructed 
for the other semiotic relations studied earlier. 

The evolution in the coherence webs in Figure 7.20 shows clearly that the agents are 
coordinating their ontologies, despite direct feedback about the meaning of a game. Feed- 
back only comes from the pointing through the external world. More experiments need 
to be done to with more channels so that the set of possible alternative conceptualisa- 
tions is sufficiently significant to investigate the co-evolution of language and meaning 
more precisely. Nevertheless, the experiments show that ontologies can be coordinated 
without needing to be innate. 



7.6 Conclusions 

The experiments reported in this chapter have demonstrated that the various mecha- 
nisms introduced in earlier chapters not only work in computer simulations but also in 
experiments with embodied situated agents. The grounding of language games in physi- 
cal reality introduces perceptual and behavioral anomalies which may cause failures and 
additional ambiguities in the lexical systems of the agents. However the mechanisms in- 
troduced before, particularly the forces damping synonymy and ambiguity, still prove 
to be adequate to lead the population towards a coherent successful language system. 

The lexical and ontological evolution observed even in small populations of agents 
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quickly becomes too complicated to investigate by hand. I therefore introduce a number 
of tools, such as the semiotic landscape, the coherence diagrams, and the coherence 
web. We clearly need more of these tools and we need to study additional properties of 
emergent lexicons, such as the semantic relations between the words and how they may 
evolve. 




Figure 7.20: Coherence diagrams visualising the coherence of ontologies in a group of 
10 agents for a series of 1000 discrimination games. Lines emanate from the 
center when there is close to 0 % coherence. They approach the edges in the 
case of 100 % ontological coherence. 
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The ideas and results that have been discussed in Part I of the present volume were 
based on results obtained in idealised conditions. Computer simulations are able to test 
the validity of the basic algorithms, but simulations are still theoretical in nature. It is 
well known that technologies can only become a reality after experiments have been 
carried out in open-ended real world conditions. Nobody would want to fly an airplane 
that has only been tested in computer simulations. 

We did the same for the Talking Heads experiments and this was a huge step. Many 
more people had to get get involved: for making the code more robust, setting up the 
teleportation infrastructure, and installing and maintaining the physical installations in 
different countries. The mechanisms about language evolution discussed in Part I of 
this book did not significantly change. On the contrary, they received solid experimen- 
tal confirmation. At the same time, we learned a huge amount about the dependencies 
between these mechanisms and the environments in which agents use them. We also 
learned a great deal about running software agents in an open infrastructure distributed 
over the entire globe. Furthermore we learned a lot about the interaction between hu- 
mans and agents ranging from enthusiastic participation and enjoyment to nasty attacks 
by English hackers, set on destroying the experiment. 

This chapter provides more detail on the first main experiment that took place in the 
summer of 1999 as part of the Laboratorium exhibition in Antwerp. The next chapter doc- 
uments follow up experiments that took place within the context of the NOISE exhibition 
in Cambridge and London and at several other locations. 



8.1 The Laboratorium exhibition 

The Laboratorium exhibition was a major artistic event in Europe during the summer of 
1999 organised by Bruno Verbergt of an organisation called Antwerpen Open. It was one 
of the first exhibitions that put so much emphasis on art as a research activity and on 
profound interactions between art and science. Participants included both scientists and 
science historians such as Peter Galison, Bruno Latour, Israel Rosenfield and Isabelle 
Stengers, as well as architects such as Rem Koolhaas and artists with an affinity for 
research such as Peter Fischli and David Weiss, Gabriel Orozco, Carsten Holler, Pana- 
marenko, Lawrence Weiner and others. The exhibition was visited by 8000 people. 

The introduction to the catalog edited by curators Hans Ulrich Obrist and Barbara 
Vanderlinden 1 stated the objectives as follows: 



1 The catalog of the Laboratorium exhibition was published as Obrist & Vanderlinden (1999). 
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Laboratorium is an interdisciplinary project in which the scientific laboratory and 
the artist’s studio are explored on the basis of their various concepts within the dif- 
ferent disciplines. How can we attempt to bridge the gap between the specialized 
vocabulary of science, art and the general interest of the audience, between the ex- 
pertise of skilled practitioners and the concerns and preconcepts of the interested 
audience? 

Laboratorium will search the limits and possibilities of the places where know- 
ledge and culture are made. Throughout the summer we will establish within the 
city of Antwerpen networks, fluctuating between highly specialized work by sci- 
entists, artists, dancers, and writers. “Working places” where the participants com- 
municate their findings on the “work in progress”. Also the scientific laboratories 
in Antwerpen will be involved in the initiative. 

Laboratorium started as a discussion that involves questions such as: What is 
the meaning of laboratories? What is the meaning of experiments? When do ex- 
periments become public and when does the result of an experiment reach public 
consensus? Is rendering public what happens inside the laboratory of the scientist 
and the studio of the artist a contradiction in terms? These and other questions 
are being offered in this interdisciplinary project that starts with the “workplace” 
where the artists and the scientists experiment and work freely. 

The event was part of the activities, celebrating the famous Antwerp painter Anthony 
van Dyck born 400 years earlier. In preparation of the exhibition a series of discussions 
were held between Carsten Holler, Bruno Latour, Hans-Ulrich Obrist, Luc Steels and 
Barbara Vanderlinden. These discussions helped to shape the general concept of the 
exhibition and the selection of the artists and scientists who would participate. Excerpts 
appeared in the exhibition catalog and other publication fora, Obrist (2003). 

The exhibition took place in two locations: the photography museum in Antwerp, 
which was the main location, and an annex occupying several floors in a high rise office 
building close to the central station (the so called President’s Building). The Photography 
Museum Exhibition featured work by several well known artists, architects and scien- 
tists. The annex featured the Talking Heads experiment as well as work by artists Joseph 
Grigely and Matt Mullican and science philosopher Isabelle Stengers, who had set up 
a reconstruction of Galileo’s famous experiments. There were also a number of public 
presentations under the heading “The Theatre of Proof”, organised by Bruno Latour. The 
catalog was designed by Bruce Mau and his team. 

In addition to the installation at the exhibition in Antwerp itself, there were two ad- 
ditional external sites for the Talking Heads experiment operating within the same time 
frame: The Paris site at the Sony Computer Science Laboratory in Paris and the Brussels 
site at the Free University (Vrije Universiteit) of Brussels. 

Months of preparation and testing took place before we attempted to go “live” and 
public. Once this happened, work continued frantically to shake out bugs, get the semi- 
otic dynamics right and then maintain the general communication infrastructure and 
the ongoing interactions with participants. The fascinating email correspondence (see 
section 8.3) shows that most of the difficulties initially came from running the agent 
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Figure 8.1: Hans-Ulrich Obrist and Luc Steels during the opening of the Laboratorium 
Exhibition in Antwerp, summer 1999. 



teleportation infrastructure. The reader has to keep in mind that this was late nineties, 
when large-scale uploading and downloading, cloud computing, agent architectures, etc. 
were totally in their infancy. 

The real heros of the first Talking Heads series were Angus McIntyre, who was the 
chief architect of the agent teleportation infrastructure, Frederic Kaplan, who kept an 
eye on the Paris site in particular, Joris Van Looveren, who looked after the Brussels and 
Antwerp sites, and Mario Campanella, an aeronautical engineer from Brazil who had 
shown up at the last minute to help keep track of the Antwerp installation. I focused 
on the overall dynamics of the experiment, which initially was certainly not going the 
way it should, and on handling the contact with the exhibition organisers and the press. 
Silvere Taj an, Alexis Agahi and Holger Kenn helped to create the telecommunication 
infrastructure. 

8.2 The installation 

The installation in Antwerp was announced as a “Laboratory for Cognitive Robotics 
and Teleportation” (Steels 1999). It featured various rooms as shown in the layout in 
Figure 8.2. The rooms had different functions: 

1. The central room which was visible on entering the space contained the experi- 
ment itself, i.e. the two cameras mounted on tripods, the computers driving them, 
the screen with geometric figures, and a projection of what went on. The activities 
of the agents were audible through a narration. 
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2. A reading room contained background philosophical and scientific papers and 
excerpts from books about language, meaning and their origins. 

3. A user interface room contained a workstation where visitors to the exhibition 
could create their own agents, direct them to play games on certain sites, and teach 
new words to their agents. 

A typical example of the kind of images that the cameras picked up are shown in Fig- 
ure 8.4. To play a game, the agent had to find a sufficiently delineated group of objects 
and each object had to exceed a minimal size. Initially quite complex configurations were 
put on the white board but this made it very difficult for agents to find a coherent group 
and to develop good concepts for referring to the topic. The right image in Figure 8.4 pro- 
vides clear examples of up, down, left and right, different colors, and also opportunities 
to use multiple words (such as red bottom). 

A typical example of an interaction is shown in Figure 8.5. The speaker (called “rub- 
ber”, an agent created and named by a user) has selected the blue object at the bottom left 
of the screen. We see the discrimination trees of the agent at that point and the data for 
each of the objects recognised. The coordinates of the respective objects (scaled within 
the coordinate system of the captured image) are (0.0, 1.0) for the blue object, (1.0, 0.96) 




Figure 8.2: Layout of the “Laboratory for Cognitive Robotics and Teleportation” at the 
Laboratorium exhibition in Antwerp. 
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Figure 8.3: A single Talking Head at the Paris site. In the background we see (bottom) 
the display with an image that the agent picked up, (on top of that) a screen 
displaying the processed image and the discrimination trees, and to the right 
the loudspeaker broadcasting the speech output of the agent. 




Figure 8.4: Examples of images picked up by the cameras. The chosen topic was signalled 
by drawing a bounding box around it. During experimentation, it was found 
that configurations such as in the right image led to more stable performance. 
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for the red one, and (0.42, 0.0) for the green one. The values are displayed as hp (horizon- 
tal position), vp (vertical position), h (height), w (width), a (area), r (red), G (green), y 
(yellow), b (blue), l (lightness). The blue object is chosen as the topic and the value on the 
blue channel has the most discriminative power, although there are other possibilities. 
There are three competing words for naming blue: Xagadude, Nibidesu, and Tetipi. None 
of them have a score higher than 0 and so a random choice of Xagadude is made. 

The hearer (called “me”) perceives an image which is slightly different from the one 
seen by the speaker. Also the discrimination trees built up so far by this agent are dif- 
ferent from the ones used by the speaker. The hearer does not know the word, so the 
game fails. After the speaker has then pointed to the topic, the hearer conceptualises 
and guesses that the meaning of this word is ‘blue’ because that is also for him the most 
discriminating feature of the topic (which has coordinates 0.0, 1.0) compared to the other 
objects in the context. 

The experiment had also a presence on the web, designed by Angus McIntyre. Anyone 
could log in on the Talking Heads website, create an agent with a given name, and launch 
it on a tour of the various physical sites. There was also a forum on which users could 
discuss the experiment as it was progressing, a hall of fame for the agents that were the 
best communicators, and an overview of the lexicon that had formed so far. The website 
allowed inspection of what was happening at each site (Figure 8.6): Users could check 
which agents were waiting there to play a game, and what the current game was about. 
It also displayed statistics about the communicative success and agent activity at each 
site. The site is no longer operational due to changes to the underlying software but 
some remnants can be visited here: https://ai.vub.ac.be/talking-heads/ 

An example of how the web interface displayed a single game is shown in Figure 8.7. 
There are two agents: Antonusius is the speaker and Zelebot is the hearer. The green 
object has been chosen as topic and correctly recognised by the hearer using the word ka- 
zozo. The meaning of the word was not visible through the interface because we wanted 
users to learn themselves the language of the agents. 

8.3 Start up of the experiment 

Once the physical installation at the Antwerp site was completed, work focused on get- 
ting the experiment itself up and running. It is interesting and instructive to look at some 
snippets of the email correspondence during the first weeks. The despair when trying 
to cope with the unavoidable problems but also the excitement as a language began to 
emerge shines through. The English humor of Angus McIntyre was a welcome antidote 
to all the stress. The correspondence reprinted here is a small selection but also neces- 
sary fragmentary because a lot of communication took place face-to-face or through the 
telephone, as email was not always possible. 
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Rubber hole 

speaker 




“xagadude” 



<L 




< 


<5 


<r 


5: rectangu... 


<5 






9: blue 

<r ;: 


<P 





(0.00, 1 .00): HP=0.37 VP=0.71 H=0.48 W=0.21 
A=0.45 R=0.1 7 G=0.00 Y=0.00 B=0.39 L=0.28 
(1.00, 0.96): HP=0.70 VP=0.69 H=0.38 W=0.22 
A=0.45 R=0.98 G=0.00 Y=0.52 B=0.00 L=0.36 
(0.42, 0.00): HP=0.51 VP=0.31 H=0.21 W=0.51 
A=0.70 R=0.00 G=0.99 Y=0.73 B=0.00 L=0.46 
Categorization: ((B 0.25-0.5)) 

Words: ((XAGADUDE ((B 0.25-0.5)) 0.00) 
(NIBIDESU ((B 0.25-0.5)) 0.00) 

(TETIPI (B 0.25-0.5)) 0.00)) 

Choose: ((B 0.25-0.5)) - (XAGADUDE) 



“xagadude?” 



<< 


< 


< 


• : : 


<5 




<C:: 






9: blue 







(1.00,0.96): HP=0. 70 VP=0.69 H=0.38 W=0.22 
A=0 . 45 R=1 .00 G=0.00 Y=0.50 B=0.00 L=0.32 
(0.00,1.00): HP=0.45 VP=0.67 H=0.50 W=0.22 
A=0 . 48 R=0.1 7 G=0.00 Y=0.00 B=0.39 L=0.23 
(0.42,0.00): HP=0. 59 VP=0.25 H=0.23 W=0.54 
A =0.77 R=0.00 G=0.77 Y=0.82 B=0.00 L=0.40 
Unknown word. 

New association: (XAGADUDE) + ((B 0.25-0.5)) 



Game 1 1 40 Thu-24-02-2000 5:20 pm 



Figure 8.5: Top: Source image, conceptualisation, and word choice of the speaker. Bot- 
tom: Source image, parsing and interpretation of the hearer. 
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the talking heads 

Hf Sites Info 



World 

yfi' Servers 
^ Hall of Fame 
Forum 
0} Lexicon 

Agents 

d&Hist 
{i Create 
Login 



The following sites are currently running in different places of the world. You can see 
the list of the agents on each site. More information including the latest images seen 
by the agents is also available. 



Brussels 

blooct/cat6 
Rubbers hole 

bloo(t/cat7 
EXPQ2000 SST 



[site overviewl 

silt/ wilt/ 

Paul Jonas 

Cosmican SST 

DeepInNt/Soul 

SST 



brainy 

areenkiter 

Hannopolis 

SST 

Hannarchic 

SST 




Figure 8.6: The Talking Heads website main interface. Users could create and manage 
their own agents and inspect what was happening at each site. Here 12 agents 
located at the Brussels site are listed. 
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£ the talking heads 

It Sites Info 



World 

Servers 



Last interaction at the Antwerp site 



The Antwerp site is part of the UVBORATORIUM project. 
You can see the installation through our webcam. You 
J Hall of Fame can also read more information about the Antwerp site. 



^ Forum 
0! Lexicon 

Agents 

A' List 
Creole 
Login 



Speaker Antoniusus has 
seen: 



Hearer ZeleBot has seen: 



□ 



□ 





Antoniusus has chosen the highlighted 
object as topic. It has used the word 
kazozo to express how this object is 
different from the other objects in the 
image. 



ZeleBot has interpreted kazozo as 
describing the highlighted object. 



The Game was a SUCCESS 



Time elapsed since last game: 1 minute 12 seconds 






Figure 8.7: Example of an interaction that took part in Antwerp. This game was a success. 

The agents recognised the same set of objects and the meaning of kazozo was 
effective in finding the same topic. 
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We pick up the email correspondence on June 23, 1999 when the first experiments are 
taking place to try and make language games at the three sites possible and allow the 
scheduling and activation of agents. 

23 June 1999 



-============_- 121 8 1 8 85 2 0==_d============ 

From: Angus McIntyre <angus@csl . sony . f r> 

Subject: Re: Brussels is back 

At 9:04 pm +0200 23.06.99, Joris Van Looveren wrote: >Brussels 

is officially back on line. I forgot to make sure that there 
>were no agents here before I launched it, so don't be surprised 
if it 

>turns out that some agents have suddenly been cloned : ) 

I don't think agents can be cloned; 'headnet' will probably just 
say "Oh, back so soon?" and overwrite existing definitions. 

This may lead to inconsistencies if two copies of an agent are 
out and about at the same time . . . 

Brussels is active, which is good. Paris appears to have gone 
to sleep, either because the lights are out, or because it's 
past its pre-defined bedtime. Still, the Babel software still 
seems to be running with nothing more than the usual email 
process errors. 

>- The machine is still fast enough to do other work. The only 

> thing that is a bit annoying is that it stalls when images 

> are being grabbed (read: sent over the bus) . 

There's a lot of data moving over a slow line, and evidently the 
MacOS is giving it priority to prevent dropped frames when doing 
video capture. 

>- Since there is no sound, it is blazingly fast compared to 

> the Antwerp installation (2-3 games per minute or so) 

Speech and camera motion are the big slow downs. This means 
that we're not under a lot of pressure to optimize the agent 
code, as the agents can do all their discriminations faster than 
we'll ever be able to drive the cameras (until we try using the 
little EVI-G21's, which only have to move a little CCD unit 
rather than a whole camera assembly, and apparently track very 
quickly indeed) . 

I've announced the site to a handful of links pages (Peter 
Norvig's AI links collection, Chris Bogart's Constructed 
Languages page, the NASA real robots page - links to all these 
sites are under 'background' on our site) . I haven't yet done a 
massive submit of the site to the big search engines or 
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announcement sites; I suggest that we let traffic build slowly 
so we can find out what we can handle. A 



24 June 1999 

The (faster) machines that were planned are now available, so that a switch has to be 
made while the experiment is already running. 



-============_- 12 7 8 1 8 852 0==_d============ 

From: Angus McIntyre <angus@csl . sony . f r> 

Subject: 'headnet' now running on new PII 400 server. 

Silvere and Alexis have installed Linux and the headnet software 
on the new 400Mhz Pentium II server, and I've done the DNS 
switch (and updated /etc/hosts, and csl.zone, and sony. rev, and 
sony. zone, and killed half a dozen in. named processes, and 
killed squid, and restarted squid, and killed squid, and 
restarted squid, and killed squid, and deleted squid's cache, 
and restarted squid, and flushed my Netscape cache, and flushed 
everyone else's Netscape cache, and flushed the Netscape cache 
of eight dozen people in Australia who I've never met, and 
individually inspected and edited every packet on the Internet 
to make sure that 'headnet.csl.sony.fr' now points to the new 
box) . 

The result of this is that 'headnet.csl.sony.fr' should now get 
you the P400 server, and 'headnet-dev.csl.sony.fr' should get 
you the old 200Mhz Vectra, which can be recognised because it 
has the words 'test site' in the banner that appears at the top 
of the page. 

With luck, the change should be transparent and you shouldn't 
need to do anything. You should however check to make sure that 
your agents are going to the right server, however, and that 
when your browser looks for 'headnet', that it gets the correct 
machine. If your Babel server accesses 'headnet' via a cacheing 
proxy server, you may also need to ask your sysop to clear the 
server's cache and restart it. 

Enjoy A 
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-============_- 12 7 8 1 8852 0==_d============ 

To: Joris Van Looveren < j oris@arti . vub . ac . be> 

From: Angus McIntyre <angus@csl . sony . f r> 

Subject: Re: Brussels 2pm 

At 2:00 pm +0200 24.06.99, Joris Van Looveren wrote: >We're 

planing to go to Antwerp in 1 hour. There is no telephone there, 
>so we will communicate by email ... We're thinking of clearing 
the 

>database around 4 pm. ... do you want to do it from Paris ? 

You do it. At 3:55pm, I'll kick all the agents off 'paris' and 
pause it. When I see headnet has been cleared. I'll restart 
'paris' and make some new agents. 

I'll upload the changed TH website with the direct links to 
'headnet' at the same time. 

A 



-============_- 12 7 8 1 8852 0==_d============ 

From: Angus McIntyre <angus@csl . sony . f r> 

Subject: Things to do in Antwerp when you're head (net) ... 

There seems to be some problem with the changeover of the 
machines. What's happened is that the new address for 'headnet' 
(193.105.194.10, which is to say the PII 400, rather than 
193.105.194.8, which is the Vectra formerly known as 'headnet' 
but now represented by an unpronounceable symbol that means 
' headnet-dev' ) has not yet propagated through the DNS as far as 
Belgium . 

So Antwerp (yeah! welcome to the net!) is sending its agents to 
'headnet-dev', rather than 'headnet'. There's no great problem 
here as far as testing your installation goes, except that 
'paris' is pointing at the new 'headnet' and 'brussels' is 
apparently dead, so you won't be seeing any foreign agents. 

In theory, if you reboot, the DNS results cached on the 
Macintosh should go away, forcing a re-lookup which ought to get 
the correct values. If that doesn't work, you could try taking 
the MacTCP DNR file which lives in the System Folder *out* of 
the System Folder and then rebooting. And if that doesn't work, 
we just have to be patient and wait for the changes to propagate. 
You could also consider entering the address of 'headnet' as 
dotted IP (193.105.194.10) in the network configuration dialog. 

A 
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From: Angus McIntyre <angus@csl . sony . f r> 

Subject: Re: Database cleared 

At 5:47 pm +0200 24.06.99, Joris Van Looveren wrote: >The 

databases have just been cleared. You can create new agents now. 
>Brussels might be down for some more time, so check if it's up 
before you 

>send any agents there. 

OK. I notice there are bugs in the server code which cause it to 
generate scads of errors when the database is empty, but those 
should clear. 

I've restarted 'paris', and I'm now going to make some agents to 
send there. Antwerp is showing up on 'headnet', and the error 
messages strewn all over the site should go away when we get 
some data into the database. 

When it begins to look a little more solid. I'll upload the 
changed pages on 'talking-heads', and then we'll really be live. 
A 



From: Angus McIntyre <angus@csl . sony . f r> 

Subject: Re: Webcam 

At 6:28 pm +0200 24.06.99, Joris Van Looveren wrote: >The 

Antwerp webcam is on-line: 

I've added a link from the Antwerp page on the main site, and 
I've also added a link in the descriptive text about the Antwerp 
server which appears on the 'server overview' page. 

I actually saw one of you - Fred? - through the Antwerp webcam a 
moment ago . 

> Also, the Antwerp and Brussels site are up 'permanently', 

> so can send agents to them. 

I've created a bunch of agents and sent them round the sites. 

The server seems to be acquiring consistency. 

I'm off home now. If you need me to come back and kick the 
server, call me on: +33-1-42-7 8-xx-xx Bye, A 



215 




8 The first series (1999) 



25 June 1999 

The different sites and the teleportation infrastructure are now running but there are 
still basic problems stemming from hardware and software glitches. 



-============_- 121 8 1 8 85 2 0==_d============ 

Date: Fri, 25 Jun 1999 05:13:55 EST 

From: "frederic kaplan" <f red@captage . com> 

Subject: First night 

The talking-heads in Antwerp have passed the night OK... 34 

agents 18 users... It goes fast 

It seems that Paris and Brussels are not playing anymore ? 
Fred . 



From: Angus McIntyre <angus@csl . sony . f r> 

Subject: Re: Antwerp & Brussels live! 

At 10:52 am +0200 25.06.99, Joris Van Looveren wrote: >1 got 

e-mails from Antwerp and Brussels that they resumed normally 
this 

>morning. So they both survived their first night on the job! 
Paris seems to be still ticking along. One thing I notice is 
that it's quite easy for agents to 'pool' on one particular 
server at the expense of others. The more agents you have on any 
one server, the longer it takes for each agent to get through 
its assigned 50 games or whatever. So servers with many agents 
tend to stay occupied, and others which are waiting to receive 
agents can often stand empty waiting for the agents to play out 
their games on the crowded servers. 

It's probably a good idea to have a few agents (either the ones 
owned by 'adm' or your own agents) set up to make round trips, 
playing five games here, five games there, and so on. Long 
routes of small numbers of games rather than short routes 
involving many games is probably the way to load balance and 
prevent servers drying up. 

> ... the database is still accessible through a simple URL, 
>without any protection (password) . I think since headnet has 
gone 

>public, it would be a good idea to change this . . . 

The same idea had occurred to me . I'll look into that (trans: 
I'll try to find the piece of paper on which I wrote down the 
root password to 'headnet' and, if I find it. I'll set up an 
.htaccess file to keep strangers out) . 
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I've told a few friends of mine about it, and the response so 
far has been generally positive, although Sylvia - Luc's 
proofreader - apparently has some reservations about the 
usability of the site. 

Hmm. News update. 

A friend of mine, who shall be nameless, sent down an agent with 
route: 'brussels paris antwerp' As I predicted a few days ago, 

this will (and did) cause a LISP-level error. 

I patched the route by hand and then made the mistake of trying 
to use MCL's rather flaky 'Restart frame' option to go on. At 
first this seemed to have worked, but about one game later 
things went bad and the whole machine locked. One of the three 
agents then on the server managed to get out just seconds ahead 
of the meltdown, but two more were caught on the server and are 
now in limbo. I've edited the database for one of them to try to 
resurrect it, and if that works. I'll try to revive the other as 
well . 

I'll also write a quick bit of code to do route-checking to try 
to make sure that this doesn't happen again. 

A 



-============_- 1278188520 ==_d============ 

From: Angus McIntyre <angus@raingod . com> 

Subject: The Lazarus Syndrome 

When a Babel server falls over and agents get lost, it's 
possible to restore them to life by going onto Headnet's 'admin' 
page and editing the agent's entry in the ' THAgent ' table. 
Basically, if you reset the 'isonserver' field to 1 (it will be 0 
when the agent's away), the agent should spontaneously reappear 
on headnet and then move off to wherever it was meant to go. 
There will be inconsistencies in the database - 'headnet' will 
remember all the words that the agent used before the crash, but 
the agent's own memory is reset to what it was when it left 
'headnet' - but with luck they shouldn't be too serious. 

By the way, there's a bug in the 'headnet' software with respect 
to routes; there's a finite limit on the field length for the 
'route' field. If you make a route that's too long, it gets 
truncated, so you can have an agent whose route goes: 

... paris 10 bruss The agent will complete its games on 'paris' 
and then go into stasis waiting for a server called 'bruss' to 
pick it up. A patch for this would probably be nice to have at 
some point . A 
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-============_- 12 7 8 1 8852 0==_d============ 

From: Holger Kenn <kenn@art i 9 . vub . ac . be> 

Subject: Alpha box in Antwerpen gone... 

Hi ! 

The alpha box in Antwerpen is gone. 

Please somebody reboot it ASAP, or we won't have a connection to 
Antwerpen anymore... 

Holger p.s.: To reboot: Disconnect Printer. Switch alpha box 

off. Switch on again. Wait about 5 Minutes. Reconnect printer. 
Holger 



28 June 1999 

After the basic hardware appeared to be operational, attention turned to the actual be- 
havior of the emergent language system. Initial results are not very encouraging. 



-============_-12 7 8 1 8 85 2 0==_d============ 

From: Luc Steels <steels@arti . vub . ac . be> 

Subject: paris site 

1 just examined the Paris site a bit. The error rate is 
distressing. This is due to many things. 

1. The calibration is way off. Even if they get it right, 
calibration mixes up completely the game. It is obligatory to 
calibrate much better tomorrow morning (can you do this 
f rederic? ) 

2 . The light conditions were very bad and have been improved a 
bit so that patches of white light are no longer seen as objects. 

3. The visual situations about which the agents are playing 
games are way too complex. Even humans would not be able to play 
the game. I simplified enormously. We need to make similar 
clear situations AT ALL sites so that the agents can really 
learn the very basic concepts first. Also salience might have 
to be set lower to have multi-categories and consequently 
multi-words (although there is a fundamental bug in the 
multi-word thing it seems) . Now the agents are making much too 
deep discrimination trees because of the confusing nature of the 
situation and the errors in pointing. If the top agent only has 
20 in the setup because it is pure chance. The fact that success 
drops means that self-organisation is NOT taking place. 

4. We might have to change the word creation rate to be less 
than 1.0. In the present circumstances I believe that every 
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agent will make for every node in the tree at very great depth 
his own word. 

I suggest that tomorrow we go through games with the agents step 
by step and fix the environment and the lights, etc. so that the 
game is at least feasible; at the moment it is not. The same 
exercise will have to be done in Brussels and Antwerp. There is 
a question how we can get rid of all the garbage that is being 
created right now. 
luc 



Date: Tue, 29 Jun 1999 13:07:48 +0200 

From: Joris Van Looveren < j oris @art i . vub . ac . be> 

Subject: Re: paris site 

Luc Steels wrote: > 2. The light conditions were very bad and 

have been 

> improved a bit so that patches of white light are 

> no longer seen as objects. The problem is reflections on the 
whiteboard. In brussels the lights are covered so that there is 
no direct light falling onto the whiteboard. Consequently, it 
happens rarely if ever that parts of the background are seen as 
objects . 

Also, turn on the 'back light' option of the cameras. This 
improves the image quite a bit, especially at lower light 
intensities. I'll try to find out if it can be turned on and off 
from software, so that the setting can be saved along with the 
other configuration data. 

Joris . 



-============_- 12 7 8 1 8 852 0==_d============ 

From: Luc Steels <steels@art i . vub . ac . be> 

Subject: Major error in feedback 

Based on the dismissal performance in Paris, we discovered a 
major error in the feedback. I suggest that all sites at this 
point PAUSE until we fix this and then we update versions on 
different sites. This bug may explain why the lexicon has 
disintegrated . 
more news soon, 
luc 
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29 June 1999 



-============_- 127 818 852 0==_d============ 

Date: Tue, 29 Jun 1999 11:38:23 +0200 

From: Joris Van Looveren < j oris@arti . vub . ac . be> 

Subject: Antwerp OK 

Antwerp has not been down, contrary to what the server page 
showed. The problem whas that the network was not accessible 
any more because the network interface of the alpha machine was 
down. This meant that the proxy was not available. According to 
Mario, the games have continued, so probably in a couple of 
hours when all interactions and agents have been uploaded, the 
database will be up to date again. 

At this moment, Brussels and Paris have been paused until 
further notice. Interactions continue to be uploaded until the 
network interface has caught up. 

Joris (in Brussels) & Mario (in Antwerp) . 



-============_-12 7 8 1 8 85 2 0==_d============ 

Date: Tue, 29 Jun 1999 16:35:46 +0100 

From: Frederic Kaplan <kaplan@csl . sony . f r> 

Subject: Patches 

Bugs fixed 

1. Feedback mechanisms and f ind-segmented-pointed by the hearer 
(this was not working at all...) corrected in: 
segment-tools . lsp and th-world.lsp 

2. scaling. If a value is higher than the maximum, it gives max 

as opposed to 1.0!!! corrected in: geom.lsp 

3. Masking. Eliminating zero channels should be done on the 

source not on value (= value after scaling, so it is often zero) . 
corrected in: prototype . lsp (in the functions folder) 

After theses patches, sites can be running again. 

Fred and Luc. 
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-============_- 1278188520 ==_d============ 

Date: Tue, 29 Jun 1999 23:25:45 +0200 (MET DST ) 

Subject: There is hope for robokind 

After fixes to the Paris site by Frederic and me, I relaunched 
some agents. Things are now beginning to work as they should. 

We can see the agents quickly build up a successful lexicon. 

It is a pity that at the moment one can no longer give more than 
20 games (although I can see the goal of getting to a global 
coherent lexicon that way) . I try to circumvent by immediately 
sending them 3 times to Paris but I am not sure this works. 
Brussels and Antwerp should wait until all the fixes have been 
made before re-entering the network. Frederic and I will send a 
mail tomorrow morning. Once the system works, it is quite 
exciting to see your agents move forward quickly! 

Luc 



30 June 1999 



============ _-12 7 818 852 0==_d============ 

Date: Wed, 30 Jun 1999 11:16:41 +0200 (MET DST) 

From: Luc Steels <steels@art i . vub . ac . be> 

Subject: progress 

The mails to talking-heads go in a log file and I suggest that 
everybody not only sends technical issues but remarks. 

The Paris site is really taking off now. A lexicon is beginning 
to be in place, mostly focusing on positional information (left, 
right, up, down) . Because the lexicon is getting established, 
it is now easier for new agents to acquire the existing lexicon 
and be successful. The agents have less bushy discrimination 
trees and fewer but effective words. This was demonstrated by 
frederic's agent (Kant) which quickly came to the top as speaker 
after only a few games! 

The word "green" apparently means red. So I am teaching my 
agents the word rouge instead of green. It is not yet clear to 
me whether you can influence the lexicon once it is already 
firmly established. 

Later this afternoon we might enrich the agents' experiences by 
changing the environment a bit . 

Luc 
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-============_- 12 7 8 1 8852 0==_d============ 

Date: Wed, 30 Jun 1999 17:16:19 +0200 

From: Joris Van Looveren < j oris@arti . vub . ac . be> 

Subject: Re: paris site 

Luc Steels wrote: > 4. We might have to change the word 

creation rate to be less than 

>1.0. In the present circumstances I believe that every agent 
will 

> make for every node in the tree at very great depth his own 
word . 

Don't do this yet. It will cause the system to crash sooner or 
later, as I experienced in Antwerp and Brussels today. The 
reason is the method utterance-word-string is not defined on 
null-utterances. I'm trying to find out what result this method 
should return. 

I've been in Antwerp today to get it running again and to apply 
the patches. It had run well for quite some time when I left, 
but something caused it to crash again half an hour later. I'll 
try to fix it tomorrow with Mario, if he has the time to go 
there . 

Joris . 



By mid July, the basic infrastructure, the teleportation mechanisms and the semiotic 
dynamics were running smoothly and so the experiment could operate without constant 
care. 

8.4 Results of the experiment 

The first Talking Heads experiment ran for 4 months during the summer of 1999 and 
showed the validity of the mechanisms that were used for the agent architecture and 
of the interaction patterns and group dynamics of the agents. A shared lexicon and an 
underlying conceptual repertoire emerged, enabling successful communication by the 
agents about the scenes before them. In total, 400,000 grounded games were played. The 
population of agents rose to just under 2000, increasing steadily over the period of the 
experiment. Despite the many perturbations due to grounding, intermittent technical 
failures, a continuous influx of new agents entering the population, and unpredictable 
human interaction, the lexicon was maintained throughout this period. 

The rate of communicative success for the first 200,000 games is shown in Figure 8.8. 
We see that the success rate is generally between 70 and 80 %. There are occasionally 
crashes (e.g. around 90,000) caused by problems at a particular site (such as bad light 
conditions). 
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Figure 8.8: The y-axis shows the average communicative success of agents at each of the 
three different sites for a total of 190.000 games shown on the y-axis. 



A total of 8000 words and 500 concepts were created by the agents, with a core vocab- 
ulary consisting of 100 basic words expressing concepts like up, down, left, right, green, 
red, large, small, etc. Of these, 8 words represent a large majority of words used (about 
80 %). 4 of these words refer to the position of objects: gorewa (top), down (bottom), 
wogglesplat (left), and sesubipu (right). 4 other words refer to colors: rouge (red), ka- 
zozo (green), wegirira (blue), and empty (light). The distribution of these words after 
130,000 games is shown in Figure 8.9. 

Figure 8.10 shows the semiotic dynamics related to an expression of the meaning for 
the concept ‘left’, i.e. a horizontal position between 0.0 and 0.25 (after scaling). The word 
“wogglesplat” becomes dominant although there is a very strong competition in the be- 
ginning. We see that users try to give other words to the same meaning, such as ’’gauche” 
or ’’links” (both words expressing left in French and German or Dutch respectively). We 
also see that other words such as ’’red” or ’’yellow” get associated with this meaning, 
because the hearer may guess the wrong meaning in learning the word for left. 

Figure 8.11 shows another example of the semiotic dynamics in the experiment, this 
time looking at all the meanings for a particular word, namely droite (meaning ‘to the 
right’ in French). This word clearly has been introduced by a human user and the domi- 
nating meaning progressively becomes a region (between 0.75 and 1.0) on the horizontal 
position, as could be expected. On the way we see some confusion, Specifically there 
must be objects that appear both on the right and up (vertical position between 0.5 and 
1.0). 



223 




8 The first series (1999) 




Figure 8.9: Distribution of word use. There are only a few words that are dominant. Many 
words are short-lived, either because the circumstances in which they fit are 
rare or because they do not get settled in the population. Moreover sometimes 
users create agents which they do not keep rescheduling for new games. 



Figure 8.12 shows different meanings competing for the same word bozopite. Domi- 
nant ones are a large area (area scaled between 0.5 and 1.0) or extended width (width 
scaled between 0.5 and 1.0). These two concepts apparently could both be applied in 
many situations, but at some point new situations appeared that caused a symmetry 
breaking and area became dominant. 

It was also possible that different meanings which were highly compatible were main- 
tained in the population. An example is shown in the form-meaning diagram in Fig- 
ure 8.13, which plots the average frequency of meanings for the word down. The mean- 
ings are all concepts on the vertical position channel vpos, but they carve out smaller 
and smaller regions. 



8.5 Conclusions 

The Talking Heads experiment was without doubt a success from many angles. The 
mechanisms for concept formation, lexicon formation and alignment (as discussed in 
Part I of this book) all worked out the way it was expected, even in very difficult ’real 
world’ conditions. The complex hardware could be maintained at the different physical 
sites and the general web infrastructure and agents teleportation mechanisms held up 
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bevuwu 

— bozopite 
centerlft 

danuve 

fibofure 

gauche 

links 

mamit 

mefali 

red 

rekifini 

rotota 

rouge 

sisibuta 

sowuwi 

sulima 

tonoto 

tonozibo 

vizonuto 

wegirira 

wogglesplat 

wolf 

wopuwido 

xesofage 

xomove 

yaretile 

ybgrshapes 

yellow 



Figure 8.10: Meaning-Form diagram: Different words for expressing the meaning ’left’, 
i.e. horizontal position is less than 0.25 (scaled). New words come up all the 
time but there is a clear winner-take-all effect with ’’wogglesplat” winning. 



despite a significant scale-up. An enthusiastic user group formed and they actively cre- 
ated agents and sent them out on the network, often also teaching their agents human 
language words which then propagated in the rest of the population. Users became very 
attached to their agents, upset when their agents could not get to the sites they had sched- 
uled (because it took almost 1 minute for a complete language game and so other agents 
had to wait), and trying to figure out how and why they had learned certain words. 

Numerous talks were given and various papers published in scientific fora. There were 
also various talks within the art context. 2 The experiment received wide coverage in the 
media thanks to its public exposure as part of a major exhibition. The Suddeutsche Zeitung 
called it “Angels with Internet wings”. All this led to invitations to show the experiment 
also in other venues, as discussed in the next chapter. Regrettably, the challenge just 
to keep the experiment in the air with the available human resources and the pressure 
for going on with new experiments prevented us from doing more adequate data gath- 
ering and analysis. Nevertheless some analyses appeared, particularly as carried out by 
Frederic Kaplan (Kaplan 2001). 



2 Examples are a “gallery talk” as Salon 3 at the Elephant and Castle Centre in London on 2 December 1998, 
organised by Hans Ulrich Obrist and Molly Nesbit, and a presentation together with Hans-Ulrich Obrist 
and Rem Koolhaas in Antwerp to launch the Laboratorium book on 3 October 2001. 
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[HPOS 0.75-1.0] 




[BLUE 0.375-0.5] 
[HEIGHT 0.0-0. 5] 
[HPOS 0.5-1 .0] 
[HPOS 0.75-1.0] 
[HPOS 0.875-1.0] 
[LIGHTNESS 0.5-1. 0] 
[VPOS 0.5-1 .0] 



Figure 8.11: Form-Meaning diagram: Different meanings associated with the word droite. 

The dominant meaning is in line with the human use of the word, namely 
horizontal position (hpos) to the right of the image. 



The field of complex systems science was at that time still in its infancy and adequate 
tools for analysing language as a complex adaptive system were in the early stages of 
development. 

The experiment ended with the following “tongue-in-cheek” post by Angus McIntyre 
on the Forum: 



1999-10-14 15:08:29 Angus McIntyre 

Bad news and good news 

As the subject says, we have some bad news and some good news. 
First, the bad news. The current run of the Talking Heads 
experiment will come to an end on the 5th of November. After 
that date, access to the system will be closed off, meaning that 
you won't be able to create, launch or inspect agents any more. 
We realise that this will be a sad day for all of you who've 
participated so enthusiastically in the experiment. We will 
consider setting up self-help programs for anyone unable to cope 
with the pain of 'Talking Heads withdrawal'. 
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Figure 8.12: Form-Meaning diagram: Different meanings associated with the word bozo- 
pit e. 




Figure 8.13: Form-Meaning diagram: Different meanings associated with the word down. 

There are in particular two meanings, one is a region on the red-channel and 
another one a region on the vertical position channel. 
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Now the good news. The Talking Heads *will* be back. We 
currently expect to launch an improved version of the system 
early in 2000, probably late in January. We shall use the 
intervening months to try to make our software faster and more 
stable. When the system returns, we should also have some new 
sites, both public and private. And we're thinking about trying 
to find ways to make the system more interesting (i.e. by 
giving you greater control over your agents and the way that 
they learn and interact with other agents) . 

If you'd like to comment on your experiences with the Talking 
Heads, or suggest ways that the system could be improved, this 
forum would be a good place to do that. I can't promise that 
we'll implement all your suggestions (we don't have very much 
time) , but all your messages will certainly be read and 
considered . 

In the meantime, on behalf of all the Talking Heads team, I'd 
just like to say 'thank you' to all of you for taking part and 
making the experiment a success. 

Angus McIntyre 

Agent Public Relations Officer 



Here is one commentary of a dedicated user: 



1999-10-27 18:52:04 Kampi 

RE : Bad news and good news 

Are you crazy? Why do you do this at a time where winter with 
its long darkness is just ahead. Taking away from me the last 
beeings I can realy understand? So, of course I'm very sad about 
the bad news. And I insist on a self-help program otherwise I 
don't know what will happen to me. I propose to create some kind 
of holiday-camp for my agents which I can run on my computer; 
for example with some beach scenery, quiet apartment with TV and 
pool and only a little bit of teaching abilities, so that they 
don't become totaly stupid although they will be able to recover 
from all these mad and debile cans in Paris, Brussels and Tokio; 
please implement the possibility that it's me who swiches off 
the light at night; then, and only then, I can be sure they have 
a good time until the restart. But for serious: I hope very 

much that the Talking Heads will return as soon as possible. 

And I think you should inform those who are interested (me for 
example) by e-mail about the re-start. Some wishes for the 
improved project: a better performance, especially for the 

lexicon. And much, much more information about the scientific 
background. How do the agents create their words? Are these 
guys on the Web realy part of the 'official'; experiment? Why 
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did one of my agents suddenly create a three-part word (was 
there a bug in the software, how could it asume, that anyone 
would understand this, is it a syntactical genius and therefore 
its only natural that noone understands it or is it simply too 
stupid to understand the rules of the game? Is it the only agent 
who did this, why did it never try this again?. . . and so on) . 
Anyway, more explanation for the 'future-heads'; about the 
experiment so that I understand why I should teach the agents. 

As far as I understood they were created to make sense (develop 
language) by their own. Yes, I know, 'stop making sense'; . It's 
their job, not mine. Therefore at least a very big 'thank you; 
for having published the 'talking-heads'; on the net. 

A very sad user 

P.S.: Please send me the holidy camp including 2 single rooms 

and my agents Caspar and Leyla as a zipped file by e-mail. 
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Shortly after the first Talking Heads exhibition in Antwerp, a new opportunity arose in 
2000 to set up and run another large-scale experiment as part of a major exhibition called 
NOISE, curated by Adam Lowe and Simon Schaffer in Cambridge and London. This was 
potentially a great occasion because the exhibition was about issues of language origins, 
coding, replication, and noise. Moreover it would allow the expertise that was built up in 
the first experiment to be reused, tested again and hopefully yield more data for analysis. 
So two new installations were set up: one in Cambridge and one in London with addi- 
tional installations at the VUB Artificial Intelligence Laboratory in Brussels, the Sony 
Computer Science Laboratory in Paris and in Tokyo, and at the Intelligent Autonomous 
Systems laboratory of the University of Amsterdam (at the initiative of Ben Krose). Later 
on a further opportunity presented itself to add an installation at the Palais de la Decou- 
verte, the main science museum of Paris. We also created a mobile version that was 
shown temporarily in several locations, as parts of other exhibitions, workshops, and 
conferences. All this further expanded the audience. The Paris exhibition alone was 
already seen by 300,000 visitors, augmented with hundreds of active participants and 
many on-lookers through the Internet. 

Although all these installations were very instructive from a technological point of 
view, and certainly spread the word, we were reaching a point where it was no longer 
of interest from a scientific point of view. The problems of maintaining public sites were 
overwhelming our scarce resources and the NOISE experiment was invaded by a group 
of hackers intended on its destruction. This episode, discussed in Section 9.2, was more 
insightful from the viewpoint of sociology and anthropology than science or engineer- 
ing. 



9.1 The NOISE exhibition 

9.1.1 The exhibition 

The NOISE exhibition was about information and transformation. Various locations in 
Cambridge (UK) participated: The Kettle’s Yard university gallery, the Cambridge Whip- 
ple Museum of the History of Science, the Cambridge University Museum of Archae- 
ology and Anthropology and the Fritzwilliam Museum. The installations ran from 22 
January until 26 March 2000. There was also a site in London at the Wellcome Trust 
TwolO Gallery in Euston Road, which ran from 28 January to 1 May 2000. Apart from 
the usual press coverage for art exhibitions, articles appeared in Science 1 and the Lancet 2 
showing that the exhibition resonated also in the scientific press. 



1 http://www.firstpulseprojects.com/sciencerev.html 

2 http://www.garnettmckeen.net/lancet.html 
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Figure 9.1: View through the street window inside the Kettle’s Yard gallery in Cambridge. 

The cameras are located before the towers. The geometric figures are attached 
to the wall and explanations of the experiment are located on the right. 



The NOISE exhibition was curated by Adam Lowe (see Figure 9.2) and Simon Schaffer. 
Adam Lowe is an artist and technologist. He is currently the director of Factum arte 
(Madrid) which is specialised in making life-like replicas of paintings, sculptures and 
archaeological objects, using laser-scanners and 3d printers. Recent realisations include 
the sculptures of Giambattista Piranesi (Lowe 2010), the paintings of Caravaggio and the 
tomb of Tutankhamun. Simon Schaffer is a science historian, professor at the University 
of Cambridge at the department of History and Philosophy of Science. He wrote exten- 
sively about the historical developments in scientific research (Schaffer & Shapin 2011) 
and animated television and radio programs, including Light Fantastic (BBC4). 

The exhibition was announced in the following way by its curators: 

A multi-site multimedia exhibition in Cambridge with “realtime” links to London, 
organised around three key themes in “digitally”: 

Universal Language 
Pattern Recognition 
Data Synaesthet ics 

NOISE is not limited to electronic media, but traces the digital imagination 
from such myths as Noah’s Ark, through the early modern experiments of Charles 
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Figure 9.2: Adam Lowe (curator of NOISE) during installation of the Talking Heads ex- 
periment at the Wellcome Gallery in London. The backwall contained the 
figures. On the right wall, posters were displayed explaining the experiment. 



Babbage’s Difference Engine and Morse’s Telegraph, up to today’s charge coupled 
devices (CCDs), robotics and beyond ... 

Displays highlight digitality in history, technology, art and science, drawing 
upon a wide range of objects and images from artists and scientists around the 
globe - everything from 3000BC artefacts to the latest state-of-the-art pictures of 
the surface of atoms. 

Not a virtual reality “hall of mirrors”, but a cultural gallery of hard (and fuzzy) 
fact. 

nOlse celebrates the world as signal-and-noise - the constant simultaneous cre- 
ation of content with discontents, as communication society filters “meaningful” 
messages from background “babble” . . . and back again. Ingenuity, serendip- 
ity and excess all play up the sensory wonderment of NOISE: The Digital and Its 
Discontents. 

NOISE is news. It’s the nuisance others make, a cacophony which prevents us 
being heard, or even thinking. Now the big noise is digital, offering us an escape 
from disorder by arranging, preserving and transmitting information. But is the 
cloudless noiseless world of digital technology the truth? 

NOISE, hazy images and sudden sparks, random mutations and puzzling glitches, 
can all become the sources of innovation and beauty. 
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NOISE celebrates the essential excess from which information is drawn. It 
probes many different ways of seeing and being in the world. Chances are your 
own sense of order is already someone else’s NOISE. 

9.1.2 Installation at Kettle’s Yard in Cambridge 

“The Kettle’s Yard Gallery is associated with Cambridge University. It acted as the main 
site of the NOISE exhibition and showed a variety of historical artefacts (including the 
brain of Babbage and the original DNA structure built by Watson and Crick) together 
with new artistic works. The catalogueue published by the Kettle’s Yard Gallery 3 fea- 
tured articles by Brian Smith, Umberto Eco, Bruno Latour, Bruce Sterling, Luc Steels, 
Peter Weibel and others. Below is my text as it appeared in the catalogue: 

Meanings are not a priori Platonic entities independent of language; meanings are 
the result of embodied interactions with the world, obtained via the role words 
play in verbal interactions called “language games”. The Talking Heads experi- 
ment explores one kind of language game: a guessing game played by two robotic 
agents about the scene directly in front of them. One agent acts as Speaker and 
attempts to draw the attention of the Hearer to some object by transmitting a ver- 
bal description of it; the Speaker succeeds if the Hearer correctly guesses which 
object is “meant”. 

To play the game, the Speaker segments the scene and performs pattern recog- 
nition to extract features - area, shape, colour, position - about the segments se- 
lected. The Speaker then conceptualises the focus element “topic” as distinct from 
those of other segments - be it the largest or the furthest to the left or the green 
one. Next the Speaker verbalises this conceptualisation using descriptive words 
selected from its lexicon. The Hearer works the other way around; it queries the 
words transmitted by the Speaker and applies the resultant meanings back to the 
scene to find what topic the Speaker intended. 

To express conceptualisations, agents need a lexicon relating words to mean- 
ings. It must function bi-directionally (words-to-meanings and meanings-to- 
words); it must also store synonyms (more than one word for the same meaning) 
and ambiguities (more than one meaning for the same word). The agents have not 
been given a lexicon; they must acquire their own common lexicon as a bi-product 
of the game. 

New words accrue in two ways: either an agent creates its own new words 
from random combinations of syllables; or it stores transmitted words together 
with possible-meaning guesses inferred from the scene, then uses a hypothesis- 
test strategy to render lexicons mutually compatible. Agents keep a running score 
for every word-meaning pair in their lexicon. When word-topic recognition suc- 
ceeds the score for that pair goes up, and that of other alternates goes down. This 
dynamic forces the lexicon of each individual agent to progressively conform, 



3 www.kettlesyard.co.uk/noise 
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and keeps it adapting to any language changes or new meanings that need to 
be expressed. During the course of the exhibition, a group of robotic agents au- 
tonomously constructs a shared language about real world scenes in front of them. 
Humans can interact with the installation through the Internet; they can teach 
their agents words and follow the general progress towards the construction of 
the language. 

Intriguing questions: How to bridge the enormous gap between the noisy real 
world of images and behaviors, and the discrete digital world of symbols and lan- 
guage required for communication and thought? How do language and meaning 
originate? Why do languages keep changing so as to adapt to the needs of their 
users? How can a language be transmitted between generations without any cen- 
tral coordination nor telepathy? 

The installation consists of two computer-controlled robotic camera heads that 
capture images from “scenes” in front of them consisting of colored geometric 
shapes pasted on a magnetic whiteboard. The configurations on the board can be 
changed at any time, making the robots’ world unpredictable and open. 

Two robotic structures will be active in this exhibition: one at Kettle’s Yard 
in Cambridge and another one at the Wellcome Institute in London, along with 
additional installations in Brussels, Paris, Tokyo and elsewhere. A website has 
also been created for the experiment (http://talking-heads.csl.sony.fr/) 4 allowing 
anyone to create new agents. People can teach them words, so that elements of 
human natural languages can sneak into the emerging vocabulary. The agents are 
autonomous and do not necessarily stick to the words given to them but try to max- 
imally adapt to the behavior of the group and invent their own words. Through the 
website it is also possible to monitor the progress of the experiment: the lexicon 
being created, the success rate, the coherence among the agents, the complexity 
of the language, etc. There is a Hall of Fame listing the best speakers and hearers. 
This motivates humans to take care of their agents thus ensuring that they move 
to the top in the Hall of Fame. 

The creation of a shared language by a group of autonomous distributed agents 
is extraordinarily difficult because there are many sources of noise, in the form of 
disturbances that cause incoherence between the agents: 

• Two embodied grounded agents always see the situation from different points 
of view so that they capture different images. Consequently they may have 
divergent perceptions and hence great difficulty to arrive at a successful 
game. 

• The word(s) transmitted may not be accurately produced or received. For 
example, one agent may produce ‘wabaku’ but the other agent may hear 
‘mabaku’. This introduces noise in the signal itself and hence possibly confu- 
sion among the agents. 



4 This website is no longer operational but some remnants can be accessed here: https://ai.vub.ac.be/talking- 
heads/ 
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• A scene can usually be conceptualised in many different ways, so that there 
is seldom certainty among the agents whether they share the same meaning. 
This causes great difficulties in learning the meaning of unknown words. 

• The lexicons and conceptual repertoires are never exactly the same as each 
agent develops them autonomously. This generates in additional sources of 
confusion. 

Any theory claiming to explain the origins of word-meaning must confront 
the handling of noise head on. 

Noise plays yet another role, namely as a motor of language evolution. Indeed 
natural lexicons evolve - even if there are already perfectly good words in a lan- 
guage. Noise on the word form causes changes in the form which propagate in 
the remainder of the population. Misunderstandings may destabilise a word and 
cause its meaning to shift. 

Another factor in language evolution is due to changes in the environment. 
Thus two alternative meanings for a word may be compatible for a while but are 
then disambiguated when a series of scenes arises in which the two meanings are 
no longer both applicable. For example, all objects may be both green and small 
and therefore there may be a word ‘sesubipu’ which may mean both, until a clear 
situation arises where a green object is no longer the smallest and a misunder- 
standing arises. 

Semiotic evolution is continually present. Different meanings for a particular 
word will emerge over over a large number of language games. During specific 
periods, different words dominate. The word ‘droite’ (originally introduced by a 
French speaker) gains the dominant meaning ‘to the right’, then shifts to ‘at the 
bottom’, and then to ‘very much to the right’. Particularly the words introduced by 
humans have a tendency to undergo this kind of strong evolution because human 
users do not know which meanings their agents employ. 

How to bridge the enormous gap between the noisy real world of images and 
behaviors and the discrete, digital world of symbols and language required for 
communication and thought? How do language and meaning originate? How do 
languages keep changing yet remain adapted to the needs of their users? How can 
a language be transmitted between generations without any central coordination 
nor telepathy? 

What is most remarkable about this experiment, is that the robotic agents do 
not come with pre-programmed ways of conceptualising reality but have to de- 
velop their own concepts. 

Each agent has been given a mechanism to ‘grow’ new distinctions by expand- 
ing discrimination trees. Each tree discretises one sensory dimension. For example, 
there is a tree for the area of a segment (scale with respect to the image) which 
divides the range of possible values into two discrete regions, which would be 
named in English ‘small’ and ‘large’. Other trees focus on position (left versus 
right or top versus bottom), shape (rectangular or oval), color, etc. Trees can go 
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as deep as necessary to carve out smaller and smaller subregions of a continuous 
space. 

The nodes of the discrimination trees grow in a random fashion but the dis- 
tinctions that are not successful in the game are pruned. This way the conceptual 
repertoire of an agent can continue to adapt to the needs of the agent. 

How do agents manage to reach coherence in their lexicons without a central 
coordinator and despite all these sources of noise? The answer is self-organisation. 
Coherence is reached in the same way as an ant society manages to form a coher- 
ent path between a food source and the nest, namely by a positive feedback loop. 
In this case, there is a positive feedback loop between use and success: The more a 
word is successful, the more it is chosen by the agents, and the more success it will 
have. This causes the agents to settle in an attractor where they all prefer the same 
word for the same meaning and vice-versa. We see a damping of synonymy as in 
the case of natural languages. Noise has the beneficial impact of getting agents 
out of attractor states (so called local minima) which are not optimal from the 
viewpoint of the whole although they are a possible solution. 

How do agents manage to share their conceptualisation of the world without 
their concepts being innately given (pre-programmed) nor centrally coordinated? 
The answer is structural coupling, another concept adopted from biology. Two sys- 
tems have a structural coupling if one creates a context for the other and vice-versa, 
so that each system develops to be maximally co-ordinated without any prior de- 
sign or global control. The conceptual system and the lexicon of each agent is 
structurally coupled in the sense that agents prune distinctions that are not suc- 
cessful in the language game, and conversely they keep the ones that are useful and 
successful. This makes the conceptual system progressively well adapted both to 
the scenes encountered by the agents and the lexicons used in the group. Sources 
of noise are again beneficial to foster structural coupling. First of all they help 
the group to push towards the use of categorisations that are robust against noise. 
Second, they help agents to explore alternatives and avoid them getting stuck in 
sub-optimal behavior. 

A website has been created for the experiment 5 . Through this site, anybody 
who wants can create new agents and follow their progress. Owners of agents can 
teach them words, so that words already used in human natural languages sneak 
into the emerging vocabulary. The agents are autonomous and do not necessarily 
stick to the words given to them but try to maximally adapt to the behavior of 
the group and invent their own words. Through the website it is also possible to 
monitor the progress of the experiment: the lexicon being created, the success rate, 
the coherence among the agents, the complexity of the language, etc. There is a 
Hall of Fame listing the best speakers and hearers. This motivates humans to take 
care of their agents thus ensuring that they move to the top in the Hall of Fame. 



5 http://talking-heads.csl.sony.fr/ 
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To express a conceptualisation, agents need a lexicon relating words with 
meanings. The lexicon must be consumable in both directions (from words to mean- 
ings and from meanings to words). It must be able to store synonyms (more than 
one word for the same meaning) and ambiguities The agents have not been given 
a lexicon a priori. They have to acquire their own lexicon as a side effect of the 
game. New words get into a lexicon in two ways: When an agent has no word 
to express a particular distinction, the agent can create a new one by a random 
combination of syllables. When an agent hears a word that he does not know, he 
stores the new word with his own guess of what the meaning could be in the scene 
being perceived. 

Agents then use a hypothesise-and-test strategy to make their lexicon compat- 
ible with the rest of the group. They keep a score for every word-meaning pair 
in their lexicon. When a word has success in the game, the score goes up, and 
its competitors go down. When the game fails, the score of the used word(s) goes 
down. This creates an inhibition-excitation dynamics making the lexicon of the in- 
dividual agent progressively conform to the most successful lexicon of the group. 
It also ensures that an agent’s lexicon keeps adapting if the language changes or 
if new meanings need to be expressed.” (NOISE exhibition catalogue) 

9.1.3 Installation at the Wellcome Gallery in London 

The second installation during the NOISE exhibition was installed in the Wellcome Gal- 
lery in London (see Figure 9.3) from 22 January until 26 March 2000. This gallery is 
associated with the Wellcome trust and featured additional art works by Joseph Grigley, 
Evgen Bavcar, Manuel Franquelo, Garret and Jones, and Giles Revell. The local curator 
Denna Jones described the exhibition as follows: 

“Digitally is transforming traditional ways of thinking about the impact of tech- 
nology on culture. This exhibition looks at how complex structures can be trans- 
formed, translated and transmitted changing the nature of communication. A 
multimedia multi-site exhibition, NOISE demonstrates how language and our five 
senses can be changed or enhanced through ‘digitally, and introduces visitors to 
pioneering developments in cross-disciplinary art and science.” 

The installation itself was similar to the one in Cambridge. There was a wall with 
geometric figures pasted on it, posters explaining the exhibition, the two pan-tilt cameras 
mounted on tripods, and the computers driving the software (see Figure 9.4). The London 
site posed particularly hard problems in the alignment of the cameras. It turned out that 
a subway was passing under the gallery and causing strong vibrations every few minutes, 
which caused the cameras to physically shift on the floor. Because the pointing behaviour 
was sensitive to alignment and prior calibration, this led to growing pointing errors and 
subsequent errors in feedback, causing a strong decline in the success rate and occasional 
chaos in the agents’ vocabularies. This was partially offset by stable conditions in other 
sites but nevertheless made the task of reaching coherence virtually impossible. 
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Figure 9.3: Catalogue cover and poster of the NOISE exhibition at the Wellcome Gallery 
in London. 




Figure 9.4: Installation at the Wellcome Gallery in London. Left: Talking Heads cameras 
oriented towards the wall on which geometric figures were pasted. Right: 
Projection of interaction during ongoing experiment. A game just failed and 
the speaker says “no”. 
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The exhibition catalogue, assembled by Denna Jones, contained the following text by 
Luc Steels: 

“The Talking Heads Experiment is a collective effort of members of the Sony Com- 
puter Science Laboratory, Paris and VUB Artificial Intelligence Laboratory Brus- 
sels, in particular Luc Steels, Frederic Kaplan, Angus McIntyre, and Johan Van 
Looveren. This research has been sponsored by the Sony Computer Science Lab- 
oratory in Paris and a GOA grant from the Belgian government to the VUB AI 
Lab. 

1. How can a cognitive agent bridge the enormous gap between the noisy real 
world of images and behaviours and the discrete, digital world of symbols 
and language required for communication and thought? 

2. How do language and meaning originate? How come languages keep chang- 
ing and how do they remain adapted to the needs of their users? How can a 
language be transmitted between generations without any central coordina- 
tion nor telepathy? 

Situated robots and teleports 

The installation consists of two robotic heads. These are steerable cameras con- 
trolled by a computer that hosts the architecture and knowledge state of each 
agent. The robots capture images from scenes in front of them. The scenes consist 
of coloured geometrical shapes pasted on a magnetic white board. The configura- 
tion on the board can be changed at any time, making the robots’ world unpre- 
dictable and open. The robot infrastructure is connected to the Internet, so that 
an agent may dematerialise from a body and travel over the internet to another 
body in which it can re-materialise. Two robotic structures will be active as part 
of the NOISE exhibition: one at Kettles’Yard in Cambridge and another one at the 
Wellcome Trust TwolO Gallery in London. 

There will be additional installations in Brussels, Paris, Tokyo and other places. 
The agent teleporting facility makes it possible to have thousands of robotic agents 
and to confront each agent with many different scenes. 

Constructing perceptually grounded concepts 

What is most remarkable about this experiment is that the robotic agents do not 
come with pre-programmed ways of conceptualising reality but have to develop 
their own concepts. Each agent has been given a mechanism to ‘grow’ new distinc- 
tions by expanding discrimination trees. Each tree discretises one sensory dimen- 
sion. For example, there is a tree for the area of a segment which divides the range 
of possible values into two discrete regions, which would be named in English 
‘small’ and ‘large’. Other trees focus on position (left versus right or top versus 
bottom), shape (rectangular or oval), colour, etc. 

Trees can go as deep as necessary to carve out smaller and smaller subregions 
of a continuous space. The nodes of the discrimination trees grow in a random 
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fashion but the distinctions that are no successful in the game are pruned. This 
way the conceptual repertoire of an agent can continue to adapt to the needs of 
the agent. 

Sources of noise 

The creation of a shared language by a group of autonomous distributed agents is 
extraordinarily difficult because there are many sources of noise, in the form of 
disturbances that cause incoherence between the agents: 

• Two embodied grounded agents always see the situation from different points 
of view so that they capture different images. Consequently they may have 
divergent perceptions and hence great difficulty to arrive at a successful 
game. 

• The word(s) transmitted may not be accurately produced or received. For 
example, one agent may produce ’wabaku’ but the other agent may hear 
‘mabakau’. This introduces noise in the signal itself and hence possibly con- 
fusion among the agents. 

• A scene can usually be conceptualised in many different ways, so that there 
is seldom certainty among the agents whether they share the same meaning. 
This causes great difficulties in learning the meaning of unknown words. 

• The lexicons and conceptual repertoires are never exactly the same due to the 
fact that each agent develops them autonomously. This brings in additional 
sources of confusion. 

Any theory claiming to explain the origins of word meaning must confront the 
handling of noise head on. 

Cultural evolution 

Noise plays yet another role, namely as a motor of language evolution. Indeed 
natural lexicons evolve - even if there are already perfectly good words in a lan- 
guage. Noise on the word form causes changes in the form which propagate in 
the remainder of the population. Misunderstandings may destabilise a word and 
cause its meaning to shift. 

Another factor in language evolution relates to changes in the environment. 
Thus two alternative meanings for a word may be compatible for a while but are 
then disambiguated when a series of scenes arises in which the two meanings are 
no longer both applicable. For example, all objects may be both green and small 
and therefore there may be a word ‘sesubipu’ which may mean both, until a clear 
situation arises where a green object is no longer the smallest and a misunder- 
standing arises.” (Catalogue NOISE exhibition) 
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9.2 Iconoclasm 

The second series of Talking Heads experiments which were part of the NOISE exhibition, 
featured again a website with which human users could create their own agents, teach 
them words by going through images of past games, and send them off on the teleporta- 
tion network. In the first series, a large group of users participated, posting enthusiastic 
commentaries on the forum, suggesting improvements to the interface, and discussing 
possible theories of language evolution. Unfortunately, during the second series, a group 
of students mostly from the University of Hull (UK) evolved from enthusiastic and inter- 
ested participants into a mob of rude thugs that wanted to destroy the experiment at all 
cost, stimulated by local curator Denna Jones and insiders at the Wellcome Gallery, who 
apparently were strongly opposed that the NOISE exhibition took place at their location 
and somehow had a personal crutch against Adam Lowe, global curator of the NOISE 
initiative. This iconoclastic event was (in the year 2000) a forerunner of the damage 
that hackers have been inflicting on the web, destroying the spirit of collaboration and 
sharing with which the web was founded. 

The British hackers realised that they could give dirty names to their agents and, be- 
cause they could teach their own agents these words, they could also teach their agents 
dirty words for colours, shapes, or any other concept that agents were using. These 
words would unavoidably propagate in the population. As these hackers were extremely 
active, creating many agents, continuously launching them to different sites, and teach- 
ing their agents words, the global vocabulary progressively became unacceptable. The 
installations were in public spaces visited by school children, and so concern grew with 
the exhibition organisers at other locations (except the London site where those respon- 
sable for the exhibition actively encouraged this destructive behaviour). As a response, 
the experiment was temporarily halted and provisions put in to avoid that a small group 
would have excessive influence. It was still possible to provide a name to your agent and 
teach your agents words but some form of decency had to be respected. This change re- 
stored order but resulted in an overreaction of the part of the English hacker group who 
now used all possible means to attack the Talking Heads servers themselves, encouraged 
by Denna Jones and aided by others at the Wellcome Gallery. 

This episode showed a phenomenon that a decade later has become very common. The 
web is far from an idealistic common ground through which people can exchange ideas 
and tools. It brings out the worst in some people, particularly if there is a mob effect 
in which different individuals with unstable ethical values push each other to do things 
they would otherwise not do. 

It is instructive and rather fascinating, particularly from a sociological and anthropo- 
logical point of view, to follow the dialog on the Talking Heads Website Forum between 
the main protagonists of the story. It went from enthusiastic interaction and experimen- 
tation to aggressive and hateful destruction. The misspellings, grammatical errors and 
foul language they produced have been left in the text. Only a small fraction of the dialog 
is reproduced here. 
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We pick up the dialog on the 25 February 2000. Until that date the forum was very 
active both with general discussions about the future of intelligence or the origins of lan- 
guage, and with specific questions, mostly why agents were so slow in playing games. 
Many users were too impatient and did not seem to realise that playing a language game 
could easily take a minute or two. However the major problem was the deviating align- 
ment of the cameras, which had to guarantee that there was a common frame of refer- 
ence between the agents, and hence the possibility of sharing attention. The cameras 
were mounted on tripods. When somebody accidentally moved the tripod, the frame of 
reference was no longer exact. As mentioned earlier, traffic (particularly subway traffic 
in London) caused vibrations of the cameras so that they kept shifting and getting out 
of balance. 

A group of University of Hull students (with names like Trash, yeah8a8y, schedski ) 
became very much involved. They communicated through the Forum using often slang 
and sexist language and making a surprising number of spelling errors and grammatical 
mistakes. The group had been trying to impose their own language by very actively 
teaching their agents and sending them to the same site, namely Paris, which they had 
noticed had the most stable operating conditions. This worked (as indeed it was supposed 
to) at which point they decided to do the same with the London site although the camera 
alignment was too unreliable to allow the evolution of a stable language. 



2000-02-25 21:37:24 Yeah8a8y 

Paris 

"Hey there peeps... Thanks to the greatness of yeah8a8y, Paris 
has a great success ratio... Oh yeah with a little help from 
TRASH. ... Ok a lot of help. . . Heres to Paris! ! ! 

The next server to be conquered is London! ! !" 



2000-02-25 21:40:13 Yeah8a8y 

RE: Paris 

And you may notice the succes of a few of our words on the 
Lexicon.... God the greatness of a couple of wasters from Hull 
Uni, three days to conquer a server... Didn't think much to 
those Frenchies anyway! ! ! ! ! 



2000-02-25 21:52:23 Trash 

RE: Paris 

"Yeah London being the capital of the greatest country in the 
world and currently last in the server league well if we can get 
them frogs to say words the brits WILL CONQUER 



2000-02-25 21:55:49 Trash 

yellow MK 1 ESCORT 

Hey mate nice seeing you around paris ! ! ! ! Enjoy the company 
especially since you know most of our words AND get the Biatches 
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right. . . you can have my sister anyday. . . so then mate see ya 
round keep driving the crap car. Have noticed the absence of 
fine ass girls in here am i wrong.... hope soooo. 



At this point, they realised that the London alignment was off and this is the first 
time that Phlox (as Denna Jones called herself) intervened. She immediately shows a 
negative attitude towards the experiment triggered by her personal conflict with Adam 
Lowe. Colleagues of hers (fish and A Londoner) respond with inside joke remarks that the 
Hull group does not understand. Fish had been trusted with admin passwords because 
he was responsable for maintaining the London site, but he greatly abused this trust as 
the experiment proceeded. 



2000-02-25 22:02:36 LondonCalling 

CALIBrat ioN OF CAMERAS London 

"Can London PLEASE calibrate their cameras???? PLEASE!!!!! 



2000-02-26 16:52:32 Yeah8a8y 

RE: CALIBrat ioN OF CAMERAS London 

very good. . . . Absolutely no chance of Domination if the cameras 
arn't calibrated... SORT IT!! 



2000-03-02 15:46:50 phlox 
RE: CALIBrat ioN OF CAMERAS 

i manage the twolO gallery in which the talking heads display is 
currently installed, would that it were so simple to sort it 
and keep the cameras correctly calibrated, all equipment was 
installed by the brussels group and whenever the cameras need 
calibrating - for whatever reason - it means one of the brussels 
boffins has to pop over to fix it. they've been over once since 
the exhibition opened and decided to move the cameras forward 2 
feet, not really an exact science is it? the grand master - luc 
steels - is coming to the gallery tomorrow, so perhaps he will 
enlighten me as to why the cameras seem to need constant 
calibration, and i'll let you know, cheers. 



2000-03-02 20:04:52 fish 

RE: CALIBrat ioN OF CAMERAS London 

that's as maybe, phlox, but i'm guessing you secretly come into 
the exhibition and kick the cameras when you're in a bad mood, 
like say when someone has been horrid to you. 



2000-03-03 20:08:36 schedski 

RE: CALIBrat ioN OF CAMERAS London 
wassup fish? 
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2000-03-03 23:06:43 fish 

RE: CALIBrat ioN OF CAMERAS London 

come on, admit it, you guys there spend all day stomping around 
your exhibits, breaking the displays, unplugging the monitors 
and generally mucking things up. i'm surprised london's still 
got two cameras even working and they haven't already been 
flogged down some boozer for a monkey and a couple of jellied 
eels 



2000-03-07 13:03:46 A Londoner 

RE: CALIBrat ioN OF CAMERAS London 

Well, if you people will confuse the poor things with your 
outrageous diction, they might know which way to point... One 
of them was practically in tears the other day as I watched it 
shaking its head in disbelief at the language it was meant to 
repeat. What ever happened to 'Adam's a cool bloke'? 



Meanwhile the Hull group is pursuing and explaining their particular experiment that 
they carry out within the Talking Heads framework. The experiment itself is interesting 
and shows that they understand what is going on and are creative. Also the overall 
dynamics is working. The group realises that a language can form on its own or that 
it can be influenced by humans teaching agents their own language. At the same time, 
there is already one user (called Cheesy) who is pushing to introduce dirty words to the 
agents. He does not belong to the Hull group but is most probably somebody from the 
Wellcome Gallery already trying to put the NOISE exhibition in a bad light. 



2000-02-25 22:55:28 Norton 

Teaching 

While participating in this experiment seemed quite interesting 
- I'm simply not getting it. When I try to comprehend what is 
meant by certain words so that I can teach my agents - I cannot 
understand the distinctions made. Does anyone get it? 



2000-02-26 16:47:55 Yeah8a8y 

RE: teaching 

The best thing to do is to leave them for a number of games... 
Like the tips say 'bout 50. . . then start looking at the words 
they use, have to be successful tho' . . . then teach your other 
bots this word for that picture! It is a bit hit and miss tho' 
coz a lot of words are ambigious (?Sp) and mean different 
things... remeber it is actual properties and not usually 
shapes... But you can have success! Check out some of mine and 
Trash's word in the top 20 lexicon, 

Hullcitynutter , eightyf if tyone, msixtytwo, wotsit, mamorys 
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(Childish I know, but hey I'm a student) and you'll realise you 
can actually do it . . . Helps if your at Paris tho' ! 



2000-02-26 20:41:20 Trash 

RE: teaching / TRIBES 

Basically think about it like this... On the Earth there are 
many different languages spoken by people usually in a specific 
geographical location. Now myself and yeahbaby looked at this 
experiment (as we are very interested in AI and have done much 
work in this field) and decided to test the idea of "Tribes" 
where all those in that tribe speak the same lingo. We also 
figured that tribes rarely move out of their "Birth Location" so 
kept them at one site. As can be seen now ALL our Agents speak 
the same language and have even taught it to others outside the 
Tribe (such as the esteemed Yellow Mk 1 Escort and possibly even 
Anne's little Agent) . These agents we considert to be 
f orreigners ! ! But stilll they can communicate with our Agents... 
If you have noticed our words in the lexicon as explained by 
yeahbaby have very few deviations from what was intended unlike 
others such as "Gumble" which appears to mean Everything! ! ! ! ! 
This I do deleave Validates mine and my colleagues (Yeahbaby) 
opinion that language can only evolve in the pressence of small 
but tightknit communities. 

As the next stage we wish to see what happens when 2 "Tribes" 
get together Hopefully they become multilingual ! ! . My theory 
is thatconfusion of words will be short lived and all will be 
ironed out after only a few games . Especially if the "tribes" 
are allowed to consolidate they meanings by once again returning 
to their own community with little outside influence.... 

I am sure you will all agree that personel experiments like 
these improve this and we would be interested to hear any 

opinion from others are we doing the right thing or just 

playing gods etc??? Does this prove evolution as a basis or is 
there a need for gods etc. . . . 

Also anyone interested in settingg up a tribe PLEASE talk to us 
first as we have much experiance in these matters! 



2000-02-27 14:39:10 Oisin 

RE: teaching / TRIBES 

Perhaps tribes could be created by sending agents to a specific 
server only eg only paris or only brussels and to see what words 
develop at the separate servers then switch after 2,4,6 months 
maybe any suggestions? 



2000-02-27 19:39:10 Trash 

RE: teaching / TRIBES 

You mean by not playing god ???? and letting them get on with it 



246 




9.2 Iconoclasm 



at one site?? .... well yes but the server would have to be 
locked down by that tribe otherwise you would get outside 
inflences teaching them stupid like gumble which meanbs 
everything... it would probably take just 100-300 games each 
agent to develop a usable language (Aslong as the server has no 
agents from outside the tribe in there) 

By teaching them myslef and Yeahbaby got usable language going 
in less than 2 days! !!!!!! Just see the success of most of our 
words in the lexicon top 20! ! 

God us 2 are good! ! 



2000-02-27 19:39:55 smartypants 

RE: Collusion Tribes 

I think the theories that are starting to surface is the most 
interesting part of the experiment for me. I started ignoring 
the 'teaching' mails as they seemed concerned with the technical 
side but thanks Trash for pointing out you had posted up your 
theory (p.s. when you talk about definitions of your words where 
are these, or are you just talking about studying the images?) . 
The bit that I would like more explanation of ('cause 
Smartypants is an ironic name) is the bit about the two tribes 
coming together, especially as Paris is really the only viable 
option at the moment. I noticed that Virtuoso (the only one of 
my robots that managed to get into Paris in the last day) was 
unranked yesterday, but today has jumped to 140ish as both as 
speaker and a hearer - I guess this is because that one 'tribe' 
have been forced to cement that language. 

The trouble is, without a good interpreter (us I suppose), won't 
it just cause confusion in our robots when they go and meet the 
other 'dumber' ones? i.e. they teach the others our words, but 
the others teach them the wrong words back as well. This means 
you would keep playing several iterations of the games. 

My thoughts on collusion were centered around this - we could 
reduce the iterations by all agreeing on common words to teach 
our robots, thus perpetrating more of the 'right' words whilst 
eliminating the 'wrong' words... 

. . . any thoughts on my incoherent ramblings?? 



2000-02-27 19:52:20 Trash 

RE: Collusion Tribes 

At the minute me and Yeah are off the Paris server (we are 
regrouping our thoughts for further action! ! !) But yes I did 
mean view the pictures when i said definition. 

By 2 tribes I mean obviously two who have been taught on the same 
server. So say that i train my agents on there this week and you 
train your tribe next week on the same server with different 

words when both tribes are capable of good communication we 

then take half of each tribe and place them together in the 3rd 
week on the Paris server and see if they become multi lingual. 
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We then sned these linguists back to their own tribe and see 
what happens . 

Unfortunately many people have tried to jump on the band wagon 
of Paris success and send their Agents there with no idea what 
they are doing . . . this has meant that we can not get our tribe 
to dominate as we would like 

The problems with organising tribe transfer is immense and it is 
a shame people don't exp [lain what they are doing on this fprum 
so that all could help out by say teaching their agents the 
words being used or even evacuating a site so that two tribes 
can collide . . . 

It is a real shame that an experiment into AI and Language is 
hampered because people are unwilling to communicate with each 
other .... 

Well I guess that those of us who want a proper experiment 
should just ry their best in difficult circumstances... 

About the wrong words being taught ! ! ! If your Tribe is more 
than 60% of the server then your tribe generallly succeeds in 
teaching the "foreigners" with little effect on themselves ... 
This has more to do with Probability than anything else. 



2000-02-27 20:46:29 smartypants 

RE: Collusion Tribes 

Thanks Trash. I was thinking in different terms... 

. . . seeing as anyone doing anything is on the Paris server, I 
was imagining that as one 'big' tribe and that any outsiders 
(stuck on the other servers, or new) are the 'other' tribes so 
thanks for clearing that up for me. 

I ' d be interested in working with you and Yeahbaby for a more 
meaningful experiment, particularly as loads of 'dumber' robots 
(by that I mean ones not enjoying the Paris success) will soon 
be unleashed on us . 

Just to clarify, my plan up until now was to let my robots learn 
40-50 words (as suggested in the tips) , I started to teach them 
the most successful words from the lexicon yesterday (most of 
which were yours - grrrrrr, jealous!) and then I sent them to 
Paris. (This was before I started putting stuff up on the forum 
and saw your theory unfortunately) . 

The only one that got in was Virtuoso, and even then he didn't 
really get to play enough games as I was experimenting to see if 
the number of games you choose has any effect on getting into 
that server. 

Anwyay, let me know if you're interested (I've only created 7 at 
the moment ) . 

By the way, do I remember seeing somewhere that you're both 
girls????? 

... ' cause I am . . . 
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If so it is interesting to see different ways people approach 
the experiment. If I was running the experiment I reckon I could 
do research from the forum, as well as the actual experiment. 



2000-02-28 14:25:01 Yeah8a8y 

RE: Collusion Tribes 

At the moment it seems I can't get on the Paris server for love 
nor money. . . Never mind. . . Anyway Trash has good theories! ! ! 

Two particully good tribes at the minute, which also 'collided' 

( Wait a minute, Frankie goes to Hollywood anyone? ) on the 
Paris server are run by me and Trash. There are some other out 
there (Cheesy) that are more content teaching obscene word to 
their bots. Anyway by both me and him filling up Paris (Max. of 
16 agents from 20 at one point) a successful dialect has been 
taught between them. We now rank 5 words in top 10, and 9 
overall in top 20. Again Trashs theories ring true. 

And the bit about words getting taught wrong is more a case of... 
well if a french person tuaght you the word for "Tower" and you 
where listening to someone who understood french, "Tower" would 
be used in the french wording, but you also retain your 
interpretation of "Tower" and use this one which you consider 
more successful in the speaking sense. This is what happened 
with our own "hullcitynutter" and "lafizana". My agents now 
understand both, but only seem to say "hullcitynutter" (Thier 
original word, not the learnt one) 

This, of course, all goes out the window when you yourself 
teachs them. . . Maybe another bad aspect of this AI programming. 



2000-02-28 16:24:47 Yeah8a8y 

RE: Collusion Tribes 

OK no . . . I'm no girl! ! ! But there you go . . . I'm a bloke 



2000-02-28 17:04:23 Yeah8a8y 

RE: Collusion Tribes 

I'm posed with another problem... As you know Brussels and 
Paris have been inactive for a bit. . . do I keep with the project 
and just send the bots out to Paris... or use this as a change 
point and educate them in Brussel's... 

I'm going to stay with Paris... Teach other people the use of 
the words carefully set out by me and Trash... and generally 
gather more success! 

I now have 18 bots by the way most are ranked within the top 
100.. 1 or 2 exeptions... anyway I have been successful by 

sending the all to Paris... for top gamage action 
It usaully takes all night but there you go. . . even the Earth 
took an entire week to sculpture. 
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To deal with the problem of agent congestion, software issues, and camera misalign- 
ment at the London and Cambridge sites, maintenance was carried out which led to a 
temporary unavailability of the experiment. Surprisingly, the reaction of the Hull group 
was entirely negative as they saw it as a way to prevent them from gaining or keeping 
control of the language used by the agents, even though Angus McIntyre clearly and 
patiently explained why maintenance was necessary. Also, Denna Jones (Phlox) took it 
personally as she did not realise that the calibration errors were due to vibrations caused 
by traffic of the London Underground near and under the Wellcome Gallery. 



2000-03-01 15:28:10 Angus McIntyre 

Performance, problems and fixes 

As some of you have noticed, there 've been some problems with 
this round of the Talking Heads experiment. For one thing, 
success rates have generally been very low, because the language 
has never properly stabilised. For another, a large backlog of 
agents has built up, and there have been considerable delays in 
getting certain agents to and from servers. 

We are aware of these problems, and are actively working on 
fixing them. Part of the problem is that the Talking Heads has 
been a victim of its own success - lots of people participating 
enthusiastically makes for lots of agents, with new ones being 
added every day. Moreover, the Talking Heads is a 'real world' 
experiment, with real physical moving parts (the cameras) which 
means that each game takes a certain and non-trivial amount of 
time. These are just two reasons why things may sometimes move 
slowly in the world of the Talking Heads. 

Problems have also been caused by the cameras losing calibration 
(that evil 'real world' strikes again), so that our agents 
sometimes seem to be looking at entirely different parts of the 
scene, something which is bound to cause problems. Last but not 
least, there turn out to have been some bugs in our software, 
particularly in the area of learning. The good news is that 
we've identified a number of things that may have an impact on 
success, and are currently busy fixing them. 

We're about to start applying some fixes and making some changes 
to the way things work. We hope that there will be minimal 
disruption, but it's possible that the system may be a bit 
'up-and-down' over the next few days. We may also start imposing 
more limits, for instance on the number of agents that each user 
can make, and on the number of agents that can land on a site at 
any one time. (When I talk about 'imposing' limits, I don't 
mean that we're going to hunt you down and kill you if you make 
too many agents, but we might ask you politely to be a little 
bit restrained when it comes to creating new agents or sending 
them off) . 
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We hope that once we've made the changes, things should start to 
work better. In the meantime, we'd just like to apologise for 
any frustration or inconvenience, and to thank you all for 
taking part in the experiment . 

Angus McIntyre Talking Heads Current Affairs Correspondent 



2000-03-01 16:29:54 Trash 

RE: Performance, problems and fixes 

Hi Angus, (Are you Threatening me ??? Bunghole ! ! ! ! I am 

Cornholio) 

Ok I take the limited number of agents business is directed 
against my plans for world domination.... Well then its a 
fight! ! ! lol. . . 

Yeah take your point but just trying to create order out of 
anarchy. . . . strange really when i is an anarchist at heart. . . 

At last you is taking an interest in sorting out the 
problems/cheats used by people like me to pervert the way the 
system is run. Well in the best style of Hull University's 
Electronic Engineering Department you stop the cheats and I'll 
create new ones ! ! . . . only kidding mate . . . but world domination 
is mine . . . 

One request though is if you could have many smaller servers . . . . 

I'm sure this would speed up the learning process 

May i suggest waterpistols at dawn for the fight??? 

Bungholio .... 

Laters mate . . 

P.S What do the scots know about language??! ! !??? 



2000-03-01 16:09:48 Marvin 

Reduced server list 

I notice that only Brussels and Paris are now on the server list. 
Has Cambridge been removed all together ??? 

I see there have been problems, but reducing the number of 
server's down to just two, increases the load. Could you not 
just add more server's, and spread the load around ? 

So far I have two agents, waiting since 23rd Feb to get into 
Brussels . " 



2000-03-02 22:21:25 Angus McIntyre 

RE: Reduced server list 

Cambridge and London have been temporarily taken offline because 
of problems with the alignment of their cameras. If the cameras 
drift too far out of alignment, the agents end up looking at 
totally different things, making it impossible for them to agree 
on a topic of conversation. Under such circumstances, a 
language can't form and any agent that ends up on such a site 
will come away deeply confused. While it's true that taking 
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sites offline throws a greater load on the remaining sites, in 
this case it seemed like the lesser of two evils. 

We hope to restore service on these sites within the next few 
days, and to take steps to prevent a recurrence of the problem. 
We are looking into the possibility of adding some more sites to 
the network, but this depends not just on the availability of 
equipment but also on finding generous, public-spirited people 
who are prepared to find space to set up a Talking Heads 
installation and devote some of their time to keeping it running 
smoothly. We do have a few candidates in mind, however (some of 
whom don't even yet know that they're candidates, he said with a 
sinister laugh.) 

Angus McIntyre 

Talking Heads Junior Camera Joggler 



2000-03-03 13:46:06 phlox 

RE: Reduced server list 

so just how do you think the cameras "drift out of alignment"? 
and what steps do you plan to take to prevent recurrence? ask me 
to place a cctv camera watching your cameras so we can see what 
naughty person is touching them?? slap my wrists for not 
maintaining proper control in the gallery? hmmmm?? (ps - see my 
posting yesterday in response to yeah8a8y and london calling's 
messages of 25th and 26th) 



2000-03-07 00:56:52 Martin 

Nowhere to launch! ! ! 

All my agents are at home and I can launch them. . .nowhere! ! ! ! ! 



2000-03-07 13:30:22 Yeah8a8y 

RE: Nowhere to launch! ! ! 

Doh! ! ! . . . What do you think it says on the server page? 

"Due to essential maintance the interactive part of the site 
will be turned off" 

Hence no launching of agents... 



2000-03-07 15:08:20 phlox 

RE: Nowhere to launch! ! ! 

cant you play nice? do you have to be rude to everyone on the 
site? 



2000-03-07 16:32:44 Yeah8a8y 

RE: Nowhere to launch! ! ! 

Yeah. . . shitface. . . 

Nah anyway. . . I was pointing out the obvious! ! ! 

How is that message rude??? 

I dunno, givce a guy the ability to write and he thinks he's a 
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philosopher ! ! ! 

Chris 

Resident Hull Uni director of derogatory comments 



The fact that servers went offline was partly a technical matter, because a new site 
was being linked in from Amsterdam. But this was seen once more as a negative action 
and it triggered a call to start introducing foul languages both by giving names to agents 
and teaching obscene words. 



2000-03-07 20:44:09 Oisin 

RE: Nowhere to launch! ! ! 

I agree, our poor agents have no swearwords how arn the 
procrastinate against the stupider agents. TEACH YOUR AGENTS 
SWEARWORDS NOW! !!!!!!! 



2000-03-07 22:10:09 Yeah8a8y 

RE: Nowhere to launch! ! ! 

All hail "cheesyslurpscum" ! ! ! Very good aimed at the 
cheesymeister himself... "hullsuckx" indeed!!! 

Chris 

Resident Hull Uni director of Poonani 



Once the site came on-line, we began to remove the swearwords based on complaints 
from the public sites where the experiment was shown (in particular from Cambridge). 
This was the chance for Denna Jones (alias phlox) to stimulate attacks on the experi- 
ment’s servers by playing on the sentiment of the players that there was a higher au- 
thority impinging on their freedom. The Hull hackers then started to divert the php 
script of the agents so that they could reschedule agents, encouraged by fish who had 
admin rights to the London site. However they became suspicious of fish (who also had 
created an agent with the name Francis Crick) because they realised he had these ad- 
min rights and therefore should (normally) have been part of the crew that maintained 
the overall system. Denna Jones tried to quell this suspicion by denying that he was an 
insider. 



2000-03-08 13:22:44 phlox 

this is rigged 

new agents cant be launched because all sites are in the process 
of being "flushed' by angus, luc steels et al of the non-lexicon 
made up words and theyre now busy re-installing their master 
lexicon, this is the real reason no one can launch new agents 
from any of the sites, so what is the point of this game? if 
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players cant create their own language - and we can only use the 
master lexicon - why bother?? 



2000-03-08 13:39:58 Yeah8a8y 

BahhhU! 

Well if I can't play fair I just might as well tip the board 
over eh? 

If this is the only way they can stabilise AI, I feel sorry for 
the poor fools that buy a Cyberdog. . . It'll be in and out of the 
repair shop more times then a real dog shits in the street ! ! ! 

So the big brass can't play eh? Well there's me thinking this 
was a valid experiement . . . Yaknow, God playing and that shite... 
Well, I'll be going now! ! ! 

Chris 

Resident Hull Uni's deluded scientist" 



2000-03-08 14:53:31 Trash 

RE: this is rigged 

Who can't get in to servers ?? There are ways around everything. 



2000-03-08 14:57:22 phlox 

RE: this is rigged 

good, well see to it you stop the powers that be from trying to 
regulate the game. 



2000-03-08 15:23:06 TRASH!!!!! 

RE: this is rigged 

My good friend . . . who you all know has got into this piece of — 
and made it so we (Me and him) can still launch! ! ! look for us 
on the servers and also nice bots called nice things about 
Angus . . . 

Thats what Mr Scotish bloke gets from messing with a person who 
has developed his skills instead of working at this esteemed 

UNI .... 

Hull remains the forefront of Electronic skill.... 

Angus and his fellow friends at sony. . . your good kid but while 
ever the hull crew is i town you'll always be number 2 
As i write this I beleave that sony will once again allow us to 
launch in the conventional way . . . we shall see Until then just 
watch the best at work 



2000-03-08 15:33:00 fish 

RE: Hull5 Sony 0 

big hand to the hull boys, you may be a bit sensitive to 
personal insults but atleast you can kick ass when it needs 
it 

already seen what you're doing in brussels and we love it! 
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2000-03-08 15:50:55 phlox 

RE: Hull 5 Sony 0 

glad to see someones stopping Sony's ethnic cleansing of our 
dots, excellent. 



2000-03-08 15:51:52 Yeah8a8y 

In t' kingdon of t' blind, t' 1-eyed man is king 

Thanks Fish, been rumbled by Francis Crick tho'... I think he 

has administrative capabilities... either 

that or he's another of us here HACKDEMONS . . . . HAHAHAHA 

Chris 

Resident Hull Uni worshipper of all thing Satanic and Electronic 



2000-03-08 15:54:17 phlox 

RE: In t' kingdon of t' blind, t' 1-eyed man is ki 
i know who francis crick is. and as he's good mates with luc and 
angus - they probably gave him admin rights, believe me, he's 
no hackdemon .... 



2000-03-08 16:01:19 Trash 

Ahhh thats why! ! 

Oh nice to see that ... I thought that you just hated asomeone 
caled adam wasn't aware that adam IS A CUNT. . . may have to 
reopinionate myself with you . . . nice on efella you deserve 
respect 

Hull Uni Coordionator of Total System Breakdown 



The story continues as the Hull hackers entice Denna Jones (Phlox) to stand in front 
of one of the talking heads cameras so that they could see her. She actually does, to great 
acclaim of the hackers - who seem surprised she is a woman. Denna Jones keeps further 
encouraging the Hull hackers. And from here on, the tone gets increasingly aggressive 
as loopholes are closed to prevent manipulation of the experiment and attacks on the 
servers. There are again suspicions against fish who clearly has access from the inside 
using a password only given to trusted collaborators but which he abuses for destroying 
the experiment. The group is also now beginning to communicate directly instead of 
through the Forum associated with the Talking Heads website. 



2000-03-08 17:23:07 Trash 

Phloxy Lady 

Hey ahhhhhhhhhhhhhh stop the loving start the warring 
trash 

Dropouts Director of Assault Forces Against Sony Talking Heads 
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2000-03-08 17:31:27 phlox 

RE: Phloxy Lady 
too right, do it. 



2000-03-08 20:50:28 fish 

RE: OK try not to :) 

now, i dont think it is angus mucking us all around - it's not 
his style, it's far more likely to be someone else who's maybe a 
bit annoyed at having agents being taught certain words....? 



2000-03-08 21:03:19 Trash 

who is on whos side?? 

Just a question how you met angus, adam and olive???? 

Second ques... How you get on?? do you do it the same way as 
us??? 

(Please email fire-for-effect@hotmail.com with this one!!) 
Finally to get straight to the point are you one of them??? dun 
dun dungggggggggggggg 

Trash (Hull Uni's Commander of Conspiracy Theories)" 



2000-03-08 23:14:10 fish 

RE: who is on whos side?? 
i am most definitely NOT one of them! 



2000-03-09 14:26:15 fish 

We are continuing to test software fixes... 

"yeah, right, and i'm a jelly called Tracy why dont you just 
admit that you're trying to censor the site and rid it of all 
unwanted terms? infact, since you're so busy playing with 
yourselves, why not just take the whole thing offline and carry 
on in private in your labs? 



2000-03-09 15:14:28 Yeah8a8y 

Yeah how about it ! ! 

Then Luc and his mates can all go off. . . With huge wood. . . Coz 
they screwed someone off a site for something inoccuous 



2000-03-09 15:52:11 Trash 

RE: Yeah how about it! ! 

It matters not who screws who at thsio point the fact remains if 
they want to run this experiment their way and by their rules 
they should set it up purely in their own labs .... without us 
having access if they want us to contribute then they should let 
us do it our way . . 

Hey Luc and all the rest of you get on this forum and have the 
balls to talk to us" 
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2000-03-09 16:13:23 phlox 

we're still waiting. 

come on sony own up. if youre gonna exercise stalinistic 
control at least do it up front; not under the guise of testing 
software; . the only agents on site at the mo are owned by luc, 
angus, adam and joris. if it's in the public domain - and it is 
- then let the public play! 



The postings on the Forum keep going in crescendo for a while until the exhibition 
ends March 2000. 

Many aspects of this episode are remarkable, not in the least that those responsable 
for an exhibition (i.c. Denna Jones (i.e. phlox) and “fish”) were bent on creating a wave 
of negative reactions and destructions against one of the exhibition pieces entrusted 
to them. Apparently this behaviour was triggered through a conflict with Adam Lowe, 
curator of the NOISE exhibition, but the team behind the Talking Heads experiment 
had nothing to do with this. Another remarkable fact is that the hacker group became 
aggressive as soon as they felt they were no longer able to have control the way they 
wanted to, specifically to introduce disrespectful language or to subvert agent scripts to 
circumvent the central scheduler so that they could send their agents in priority to the 
sites of their choosing. This style of behaviour is a personality trait which is commonly 
recognised as characteristic for hackers, including by members of the hacker community 
themselves: 6 

“Hackers have relatively little ability to identify emotionally with other people. ... 
Unsurprisingly, hackers also tend towards self- absorption, intellectual arrogance, 
and impatience with people and tasks perceived to be wasting their time. Because 
of their passionate embrace of (what they consider to be) the Right Thing, hackers 
can be unfortunately intolerant and bigoted on technical issues, in marked contrast 
to their general spirit of camaraderie and tolerance of alternative viewpoints oth- 
erwise. ... As a result of all the above traits, many hackers have difficulty maintain- 
ing stable relationships. At worst, they can produce the classic geek: withdrawn, 
relationally incompetent, sexually frustrated, and desperately unhappy when not 
submerged in his or her craft.” 

Why did we not interfere in what was going on or directly defend our points of view 
on the Forum? There were two reasons. First, our team was small and engaged with 
many other activities. The destructive activities of Denna Jones and her friends were not 
really worth our continuous attention and precious time. Second, this whole episode was 
yielding significant data about how individuals behave with respect to artificial agents 
and about the personality traits of those who are most likely to engage with them. The 
most obvious conclusion from these data is that we should not attempt to launch such ex- 
periments in the public domain, not because they are not feasible from a technical point 

6 Raymond, E. (2013) The new hackers dictionary. Available as http://www.catb.org/jargon/html/index.html. 
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of view, particularly today with much more reliable web technologies, but because the 
way that (some) individuals are likely to behave with respect to these technologies. The 
job of figuring out how to create artificial societies and cultures that can cooperate with 
human societies is an unsolved problems and it will require help from anthropologists 
to figure out how it can be set up. Undoubtedly anonymity is one of the main sources of 
deviant behaviour (Knight, Dunbar & Power 1999). 



9.3 Installation at the Palais de la Decouverte in Paris 

As the NOISE exhibition was in full swing, the Palais de la decouverte, the largest science 
museum in Paris, took the initiative to integrate the experiment as part of their running 
exhibition for several months. This lead to the design of a sophisticated framework for 
housing the computer equipment (see Figure 9.5), additional educational materials ex- 
plaining what the experiment was about, and a new run in much more relaxed circum- 
stances with therefore much more interesting results. This new installation started its 
operation during the social dinner for the Evolution Of Language Conference in Paris 
organised by Jean-Louis Desalles on 5 April 2000. It was seen by an estimated 300,000 
visitors during its installment and an article with results of this experiment appeared in 
the “Revue du Palais de la Decouverte” (Steels & Kaplan 2000). 

9.4 The portable Talking Heads 

As news of the Talking Heads experiment was spreading, more and more inquiries were 
made to show the experiment live in other locations. So we made a portable version 
(see Figure 9.6) that could easily be installed and assembled. Initially this version was 
used to link into the live teleportation infrastructure, but once the NOISE exhibition was 
finished, it was used to develop focused experiments, in particular on colour language. 

Some of the noteworthy locations where the portable installation was used are the 
following: 

1. The European Conference on Artificial Life at the EPFL in Lausanne (Switzerland) 
in September 1999. Papers on computational and robotic models of language evo- 
lution were (and still are) greeted with hostility at linguistics conferences (even 
conferences on computational linguistics) and they are still routinely rejected for 
linguistics journals, as being irrelevant for understanding more about human lan- 
guage. However the Artificial Life conferences and journals welcomed the ap- 
proach from the beginning. This is not surprising because agent-base modeling is 
one of the main tools used in that field and an evolutionary stance is seen as obvi- 
ous to biologists. The Lausanne conference was organised in line with earlier con- 
ferences on artificial life, showing work on life-like robots, computer simulations, 
and new chemically based forms of life. It also featured a series of live demos. The 
portable demonstration of the Talking Heads was part of these demonstrations, set 
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Figure 9.5: Talking Heads installation at the Palais de la Decouverte. It featured new 
fancy structures that made the installation more attractive visually. Daily 
explanations were given to visitors by the staff of the Palais de la Decouverte. 




Figure 9.6: Portable installation with two pan-tilt cameras and a portable computer that 
was able to run the TH software. 
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Figure 9.7: Live installation and demonstration of the Talking Heads experiment at the 
European Conference on Artificial Life in 1999. 



up to accompany a presentation at the conference by myself and Frederic Kaplan 7 
It was linked in with the ongoing experiment during Laboratorium in Antwerp. 

2. The Neuer Aachener Kunstverein in Aachen (Germany) in collaboration with the 
RWTH (the technical university of Aachen) organised a general exhibition on mod- 
elling called Modell-Modell. Within this framework, artist Anne-Mie van Kerck- 
hoven invited me to cooperate in a laboratory on language and colour called “Cy- 
berlabor Chromosophy”. As part of this laboratory, the portable Talking Heads 
experiment was installed and ran from 5 May to 16 June 2000. It featured not only 
the portable Talking Heads, which was demonstrated live, but also posters, talks 
about the project and additional art works. 

3. The portable Talking Heads was also featured in an exhibition at the Ludo Mich 
Gallery in Antwerp. This exhibition focused on colour again and showed several 
other pieces related to colour and colour perception. It was accompanied by a very 
well attended gallery talk. 



9.5 Look into the Box 

The Musee dArt Moderne in Paris organised a solo exhibition of artist Olafur Eliasson 
entitled “Chaque matin je me sens different. Chaque soir je me sens le meme” between 
22 March and 12 May 2002. (Scherf 2002) Olafur Eliasson is known for his thorough 
investigations of colour, such as using monochromatic light to create an artificial sun 

7 The paper published for this conference is Steels & Kaplan (1999). 
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Figure 9.8: Live installation at the gallery of Ludo Mich in Antwerp. There was no white 
board but different small pancartes with possible scenes (see in the middle of 
the picture). The game was displayed much larger against the back wall. 
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in the Modern Tate Gallery in London (2003). Eliasson invited me to jointly work out a 
new interpretation of the Talking Heads experiment, which became “Look into the Box”. 
Nicolas Neubauer, a master student working at that time at the Sony Computer Science 
Laboratory, was the chief implementer together with Angus McIntyre. The set-up and 
results are described in detail in Steels (2004) and Neubauer (2004). 

“Look into the Box” consisted of a box in which a camera was mounted that would take 
a picture of the eye of a person who looked into the box. The eye was then projected 
much bigger on a wall opposite to the box, which in itself gave a very strong visual 
effect (see Figure 9.9). At the same time two artificial agents looked at the eye colour and 
played a colour language game. Visitors to the exhibition could hear the dialog between 
the agents and follow on a nearby screen the progression in the emergence of a colour 
vocabulary. During the course of the exhibition an artificial colour language emerged, 
reflecting the eye and skin colour of visitors. 




Figure 9.9: Look into the Box installation at the Paris Museum of Modern Art. Left: Box 
with camera inside. A lens would make the eye look bigger for the camera. 
Right: Projection of the eye on a big screen. 

This experiment was fundamentally different from the original Talking Heads experi- 
ment because it was no longer a discrimination game but a description game. The agents 
extracted the main colours from the image of the eye and then described them to an- 
other agent. Moreover the domain was now restricted to the colour domain, which had 
meanwhile become a focal topic of research in the lab (Steels & Belpaeme 2005). 

The Look into the Box installation was shown again at several locations. One was 
in July 2003 in the context of a yearly music festival in Spoleto. In 2003, the theme 
of semiotic dynamics was chosen and various presentations and discussions were held 
curated by the Italian semiotician Paolo Fabbri. A new installation of Look into the Box 
was realised and operated (see Figure 9.11). A system for playing language games with 
human users, created by Tony Belpaeme of the VUB AI Lab, was also demonstrated. This 
system was intended to collect data about colour category prototypes and names from 
human speakers in order to gather more data about colour language in discrimination 
games. 
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Figure 9.10: Example interaction of the “Look into the Box” art piece. Left: Top and pix- 
elated eye of a spectator. Right: the main colours that were extracted from 
this image. Agents played language games to describe these eye colours. 




Figure 9.11: Left: My presentation at the Spoleto Science Festival. Right: Nicolas 
Neubauer debugging the Look into the Box installation. 
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The Look into the Box installation was also shown during the “Intensive Science” ex- 
hibitions organised in October 2006 on the occasion of the 10th anniversary of the Sony 
Computer Science Laboratory. It was first shown in Paris at the exhibition space “La 
Maison Rouge” near the Bastille in Paris, which featured science/art installations and 
collaborations involving members of the Sony Computer Science Laboratory, including 
work by Atau Tanaka, Francois Pachet in collaboration with Jazz pianist Albert van Veen- 
daal, Peter Hanappe in collaboration with photographer Armin Linke, Frederic Kaplan in 
collaboration with the Design School of EPFL in Lausanne. The same exhibition traveled 
to Tokyo where it was shown in the Sony Explorascience museum from 22 December 
2006 to 12 February 2007 (see Figure 9.12). 




Figure 9.12: Installation at the Tokyo Explorascience Museum. The box is in the left cor- 
ner. The eye is projected on a big screen. A smaller screen shows the image 
captured by the camera, the colours recognised, and the words used by the 
agents. 



9.6 Conclusions 

The many installations and experiments taught us a lot about how it was possible to 
technically set up and maintain a distributed network of agents grounding their activities 
in the real world through cameras. These pioneering experiments happened at a time 
when the Internet was not as common as today and not as stable. It was before up- and 
down-loading and apps became widespread and sufficient bandwidth was available even 
in private homes. The experiments were stopped around 2002 because they took a lot of 
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time and not much more could be learned from a technical and scientific point of view. 

The installations were also one of the pioneering attempts to develop a strong inter- 
action between art and science, which today are more common and more recognised 
than they were in the late 1990s. They were conceived to be part of exhibitions and well 
established artists, such as Olafur Eliasson, took great interest and collaborated to give 
them a twist to function better within an art context. The context of an exhibition is a 
very effective way to reach an audience that has normally no access to ongoing scien- 
tific experiments. It also generated a stream of newspaper articles, and contributions to 
radio and television programs, which normally pay only little attention to what goes in 
science. 

The experiments also taught us a lot about the interaction between humans and artifi- 
cial agents. The idea that a benign symbiosis between artificial agents and humans was 
possible proved to be naive. There are too many people around with sufficient computer 
skill but without any ethical consciousness. They see no problem in destroying compu- 
tational infrastructure simply for their own joy. More recent examples show that these 
hackers go much further, using their power to do harm or widespread theft. As artificial 
systems will never be immune to this behaviour, it is probably not possible to imagine a 
future in which there are autonomous robotic agents, not because the robots would be 
harmful in themselves but because of the use some individuals will most likely make of 
them. 

Finally, the experiments taught us that the proposed mechanisms for lexicon and con- 
cept formation and for reaching coherence worked out. The lateral inhibition learning 
dynamics and the structural coupling between lexicon formation and concept forma- 
tion used by the agents proved later not only relevant for lexicon formation but also 
for grammar. On the other hand this dynamics is not robust enough in the face of ex- 
treme uncertainty coming from embodiment (i.e. camera disalignment), high population 
flux, uncertain environmental conditions or destructive human user interventions. Pre- 
sumably in such conditions child language and concept learning would also be heavily 
compromised and perhaps impossible. Anthropologists have argued that the origins of 
language required a strong form of sociality which is not found in other primate species 
(Knight & Lewis 2014). The experiment indeed confirms that without empathy, respect 
and a common purpose, a shared language will not come off the ground. 
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10 Beyond the Talking Heads 
experiment 



The Talking Heads experiments were obviously not an end point, but only the beginning. 
They confirmed the results that had been obtained with earlier theoretical models and 
computer simulations but did not push these models towards greater complexity. But 
subsequent work did expand the envelope of our understanding and modelling way be- 
yond these boundaries, thanks to the work of many researchers who have since joined 
the research programme. The expansion has happened along several dimensions: in- 
creased sophistication of the robots, a deeper theoretical understanding of semiotic dy- 
namics, growth in the complexity of semantics and grammar, and new breakthroughs by 
studying the emergence and evolution of language strategies. An exhaustive survey of 
these exciting profound developments is beyond the scope of the present book. Instead, I 
will highlight here only some of the key language game experiments that used real phys- 
ical robots and were direct variations or further extensions of the environments, game 
scripts and strategies used in the Talking Heads experiment. This chapter focuses on ex- 
periments in the period before 2005, particularly work with the aibo robots and the first 
attempts towards the emergence of grammar. 1 The next chapters focus on experiments 
after 2005. 



10.1 Experiments with the aibo robots 

From 1996, even before the first Talking Heads experiment was started, experiments were 
already conducted in our laboratory, primarily by Paul Vogt, exploring language games 
on the cybernetic mobile robots that we were able to build ourselves at the time. These 
robots were constructed from Lego bricks and used a basic processing board for linking 
directly simple sensors (touch sensors, infrared sensors) and actuators (left and right 
motors) using an adaptive dynamical system (see Figure 1.3). They were however too 
unreliable for long-term repeatable experimentation and the sensori-motor experiences 
were too restricted to hope for the development of interesting languages. The use of pan- 
tilt cameras, as in the Talking Heads experiment, was an attempt to have an experimental 
set-up at relatively low cost that was reliable and used vision as the source of information 
about the world. Of course this came at the price of less mobility and no true physical 
interaction with the real world. Nevertheless, many further fruitful experiments were 
done using the same set-up, particularly to explore in much greater depth the domain of 



An overview of these experiments is also given in: Steels & Belpaeme (2005). 




